Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

Li, Chunshan; Du, Qing; Xu, Xiaofei; Zhu, Jinhui; Chu, Dianhui

doi:10.1007/s11036-020-01687-0

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

Published: 20 January 2021

Volume 26, pages 104–113, (2021)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Chunshan Li¹,
Qing Du²,
Xiaofei Xu¹,
Jinhui Zhu² &
…
Dianhui Chu ORCID: orcid.org/0000-0001-8039-8394¹

322 Accesses
1 Citation
Explore all metrics

Abstract

Deep neural networks have achieved state-of-the-art performances in wide range scenarios, such as natural language processing, object detection, image classification, speech recognition, etc. While showing impressive results across these machine learning tasks, neural network models still suffer from computational consuming and memory intensive for parameters training/storage on mobile service scenario. As a result, how to simplify models as well as accelerate neural networks are undoubtedly to be crucial research topic. To address this issue, in this paper, we propose “Bit-Quantized-Net”(BQ-Net), which can compress deep neural networks both at the training phase and testing inference. And, the model size can be reduced by compressing bit quantized weights. Specifically, for training or testing plain neural network model, it is running tens of millions of times of y=wx+b computations. In BQ-Net, however, model approximate the computation operation y = wx + b by y = sign(w)(x ≫|w|) + b during forward propagation of neural networks. That is, BQ-Net trains the networks with bit quantized weights during forwarding propagation, while retaining the full precision weights for gradients accumulating during backward propagation. Finally, we apply Huffman coding to encode the bit shifting weights which compressed the model size in some way. Extensive experiments on three real data-sets (MNIST, CIFAR-10, SVHN) show that BQ-Net can achieve 10-14× model compressibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A review on the long short-term memory model

Article 13 May 2020

References

Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282
Lu H, Bo L (2019) Wdlreconnet: compressive sensing reconstruction with deep learning over wireless fading channels. IEEE Access
Deng C, Liao S, Xie Y, Parhi KK, Qian X, Yuan B (2018) Permdnn: efficient compressed dnn architecture with permuteddiagonal matrices. In: 2018 51st annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 189–202
Zhang S, Qi Y, Jiang F, Lan X, Yuen PC, Zhou H (2017) Point-to-set distance metric learning on deep representations for visual tracking. IEEE Trans Intell Transp Syst 19(1):187–198
Article Google Scholar
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Proc Mag 35(1):126– 136
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, Yang M-H (2018) Hedging deep features for visual tracking. IEEE Trans Pattern Analysis Machine Intell 41(5):1116–1130
Article Google Scholar
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, et al. (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29(6):82–97
Article Google Scholar
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1):30–42
Article Google Scholar
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, et al. (2014) Deep speech: scaling up end-to-end speech recognition. arXiv:1412.5567
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467
Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274
Seide F, Agarwal A (2016) Cntk: microsoft’s open-source deep-learning toolkit. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2135–2135
Huffman DA, et al. (2006) A method for the construction of minimum-redundancy codes. J Resonance 11(2):91–99
Article Google Scholar
Lin Z, Courbariaux M, Memisevic R, Bengio Y (2015) Neural networks with few multiplications. arXiv:1510.03009
Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160
Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on cpus
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. CoRR, arXiv:1502.02551, vol 392
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights duringpropagations. In: Advances in neural information processing systems, pp 3123–3131
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254
Choi Y, El-Khamy M, Lee J (2018) Universal deep neural network compression. arXiv:1802.02271
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, arXiv:1510.00149, vol 2
Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang S-F (2015) An exploration of parameter redundancy in deep networks with circulant projections. In: Proceedings of the IEEE international conference on computer vision, pp 2857– 2865
Shah P, Rao N, Tang G (2015) Sparse and low-rank tensor decomposition. In: Advances in neural information processing systems, pp 2548–2556
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv:1412.6550
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Aistats, vol 9, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 448–456
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Van Leeuwen J (1976) On the construction of huffman trees. In: ICALP, pp 382–410
Graham B (2014) Spatially-sparse convolutional neural networks. arXiv:1409.6070

Download references

Acknowledgements

Dianhui Chu (cdh@hit.edu.cn). and Jinhui Zhu are Co-Correspondent Author (csjhzhu@scut.edu.cn). This work was supported in part by the National Key Research and Development Program of China (No. 2018YFB1402500), the National Natural Science Foundation of China (No.61902090, 61772159), and University Co-construction Project.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Chunshan Li, Xiaofei Xu & Dianhui Chu
School of Software Engineering, South China University of Technology, Guangzhou, China
Qing Du & Jinhui Zhu

Authors

Chunshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Qing Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Dianhui Chu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dianhui Chu.

Ethics declarations

Conflict of interests

The authors declare no conflict of interest. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work entitled “Bit-Quantized-Net: An Effective Deep Neural Networks Compression Method”.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Du, Q., Xu, X. et al. Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks. Mobile Netw Appl 26, 104–113 (2021). https://doi.org/10.1007/s11036-020-01687-0

Download citation

Accepted: 09 November 2020
Published: 20 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11036-020-01687-0

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Navigation

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation