Review
Version 1
This version is not peer-reviewed
Advances in the Neural Network Quantization: A Comprehen-sive Review
Version 1
: Received: 1 July 2024 / Approved: 1 July 2024 / Online: 1 July 2024 (15:03:00 CEST)
How to cite: Wei, L.; Ma, Z.; Yang, C.; Yao, Q. Advances in the Neural Network Quantization: A Comprehen-sive Review. Preprints 2024, 2024070076. https://doi.org/10.20944/preprints202407.0076.v1 Wei, L.; Ma, Z.; Yang, C.; Yao, Q. Advances in the Neural Network Quantization: A Comprehen-sive Review. Preprints 2024, 2024070076. https://doi.org/10.20944/preprints202407.0076.v1
Abstract
Artificial intelligence technologies based on deep convolutional neural networks and large lan-guage models have made significant breakthroughs in many tasks such as image recognition, target detection, semantic segmentation and natural language processing, but also face the problem of contradiction between the high computational capacity of the algorithms and the limited deployment resources. Quantization, which converts floating-point neural networks into low-bit-width integer networks, is an important and essential technique for efficient deploy-ment and cost reduction in edge computing. This paper analyze various existing quantization methods, showcase the deployment accuracy of advanced techniques, and discuss the future challenges and trends in this domain.
Keywords
Quantization; neural network; large language model; deployment; accuracy
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment