Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor

Fernández Brillet, Lucas; Leclaire, Nicolas; Mancini, Stéphane; Nicolas, Marina; Cleyet-Merle, Sébastien; Henriques, Jean-Paul; Delnondedieu, Claude

doi:10.1007/s11265-020-01616-0

Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor

Published: 04 January 2021

Volume 94, pages 263–281, (2022)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Lucas Fernández Brillet¹,
Nicolas Leclaire¹,
Stéphane Mancini²,
Marina Nicolas³,
Sébastien Cleyet-Merle³,
Jean-Paul Henriques³ &
…
Claude Delnondedieu³

354 Accesses
Explore all metrics

Abstract

Computational complexity of state of the art Convolutional Neural Networks (CNNs) makes their integration in embedded systems with low power consumption requirements a challenging task. This requires the joint design and adaptation of hardware and algorithms. In this paper, we propose a new general CNN compression method to reduce both the number of parameters and operations. To solve this, we introduce a new Principal Component Analysis (PCA) based compression, which relies on an optimal transformation (in the mean squared error sense) of the filters on each layer into a new representation space where convolutions are then applied. Compression is achieved by dimensioning this new representation space, with an arbitrarily controlled accuracy degradation of the new CNN. PCA compression is evaluated on multiple networks and datasets from the state of the art and applied to a binary face classification network. To show the versatility of the method and its usefulness to adapt a CNN to a hardware computing system, the compressed face classification network is implemented and evaluated on a custom embedded multiprocessor. Results show that for example, an overall compression rates of 2x can be achieved on a compact ResNet-32 model on the CIFAR-10 dataset, with only a negligible loss of 2% of the network accuracy, while up to 11x compression rates can be achieved on specific layers with negligible accuracy loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Visualizing and Understanding Convolutional Networks

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

References

Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. pp 1097–1105.
Shlens, J. (2014). A tutorial on principal component analysis. arXiv:1404.1100.
Canziani, A., Adam P., & Eugenio C. (2016). An analysis of deep neural network models for practical applications. arXiv:1605.07678.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778.
Huang, G., Liu, Z., Maaten, L.V.D., & Weinberger, K.Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4700–4708.
Howard, A.G., Menglong Z., Bo C., Dmitry K., Weijun W., Tobias W., Marco A., & Hartwig A. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.
Liu, L., & Jia, D. (2018). Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. In Thirty-second AAAI conference on artificial intelligence.
Iandola, F.N, Song, H., Matthew , W. M., Khalid, A., William, J.D., & Kurt, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv:1602.07360.
Zoph, B., Vasudevan, V., Shlens, J., & Le, Q.V. (2018). Learning transferable architectures for scalable image recognition.. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8697–8710.
He, Y., Ji, L., Liu, Z., Wang, H., Li, Li-Jia, & Han, S. (2018). Amc: Automl for model compression and acceleration on mobile devices.. In Proceedings of the european conference on computer vision (ECCV). pp 784–800.
Jacob, B., Kligys, S., Bo, C., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference.. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2704–2713.
Courbariaux, M., Bengio, Y., & David, J.P. (2015). Binaryconnect: training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. pp 3123–3131.
Rastegari, M., Vicente, O., Joseph, R., & Ali, F. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision (pp. 525–542). Cham: Springer.
Courbariaux, M., & Bengio, Y. (2016). Binarynet: Training deep neural networks with weights and activations constrained to + 1 or -1. In Proceedings of the advances in neural information processing systems (NIPS).
Li, F., & Liu, B. (2016). Ternary weight networks. In NIPS workshop on efficient methods for deep neural networks.
Lin, X., Zhao, C., & Pan, W. (2017). Towards accurate binary convolutional neural network. In Advances in Neural Information Processing Systems. pp 345–353.
Han, S., Mao, H., & Dally, W.J. (2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In 4th International conference on learning representations.
Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., & Chen, Y. (2016). Cambricon-X: An accelerator for sparse neural networks proceedings of the international symposium on microarchitecture (MICRO.
Chen, Y.H., Joel E., & Vivienne S. (2018). Eyeriss v2: A flexible and high-performance accelerator for emerging deep neural networks. arXiv:1807.07928.
Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. Deep Learning and Representation Learning Workshop, NIPS.
Ensemble knowledge distillation for learning improved and efficient networks.
Luan, S., Chen, C., Zhang, B., Han, J., & Liu, J. (2018). Gabor convolutional networks. IEEE Transactions on Image Processing, 27(9), 4357–4366.
Article MathSciNet Google Scholar
Shang, W., Sohn, K., Almeida, D., & Lee, H. (2016). Understanding and improving convolutional neural networks via concatenated rectified linear units. In International conference on machine learning. pp 2217–2225.
Cohen, T., & Welling, M. (2016). Group equivariant convolutional networks. In International conference on machine learning. pp 2990–2999.
Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Advances in neural information processing systems. pp 2017–2025.
Tai, K.S., Peter B., & Gregory V. (2019). Equivariant Transformer Networks. arXiv:1901.11399.
Sabour, S., Frosst, N., & Hinton, G.E. (2017). Dynamic routing between capsules. In Advances in neural information processing systems. pp 3856–3866.
Kosiorek, A.R., Sara S., Yee W.T., & Geoffrey E.H. (2019). Stacked Capsule Autoencoders. arXiv:1906.06818.
Rippel, O., Snoek, J., & Adams, R.P. (2015). Spectral representations for convolutional neural networks. In Advances in neural information processing systems. pp 2449–2457.
Lavin, A., & Gray, S. (2016). Fast algorithms for convolutional neural networks.
Huang, G.B., Bai, Z., Kasun, L.L.C., & Vong, C.M. (2015). Local receptive fields based extreme learning machine. IEEE Computational Intelligence Magazine, 10(2), 18–29.
Article Google Scholar
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems. pp 1269–1277.
Zhang, C., Qianli L., Alexander R., Brando M., Noah G., & Tomaso P. (2018). Theory of deep learning IIb: Optimization properties of SGD. arXiv:1801.02254.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. (Technical Report) University of Toronto.
Abadi, M., Agarwal, A., & et al. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. 12th Symposium on Operating Systems Design and Implementation.
Liu, X., Pool, J., Han, S., & Dally, W.J. (2018). Efficient sparse-winograd convolutional neural networks. In The 2018 International conference on learning representations.
Triantafyllidou, D., & Tefas, A. (2016). A fast deep convolutional neural network for face detection in big visual data. In INNS conference on big data. pp 61–70.
Jain, V., & Learned-Miller, E. (2010). Fddb: A benchmark for face detection in unconstrained settings. Technical Report UMCS-2010-009.
Koestinger, M., Wohlhart, P., Roth, P.M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization.
Schwambach, V., Cleyet-Merle, S., Issard, A., & Mancini, S. (2015). Estimating the potential speedup of computer vision applications on embedded multiprocessors. arXiv:1502.07446.
Stoutchinin, A., & Benini L. (2019). StreamDrive: A dynamic dataflow framework for clustered embedded architectures. Journal of Signal Processing Systems, 91(3-4), 275–301.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire Techniques de l’Informatique et de la Microelectronique pour l’Architecture de systemes integres, STMicroelectronics R&D Ltd, 46 Avenue Félix Viallet, Grenoble, 38000, France
Lucas Fernández Brillet & Nicolas Leclaire
Laboratoire Techniques de l’Informatique et de la Microelectronique pour l’Architecture de systemes integres, 46 Avenue Félix Viallet, Grenoble, 38000, France
Stéphane Mancini
STMicroelectronics R&D Ltd, 2 Rue Jules Horowitz, 38019, Grenoble, France
Marina Nicolas, Sébastien Cleyet-Merle, Jean-Paul Henriques & Claude Delnondedieu

Authors

Lucas Fernández Brillet
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Leclaire
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Mancini
View author publications
You can also search for this author in PubMed Google Scholar
Marina Nicolas
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Cleyet-Merle
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Paul Henriques
View author publications
You can also search for this author in PubMed Google Scholar
Claude Delnondedieu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Fernández Brillet.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández Brillet, L., Leclaire, N., Mancini, S. et al. Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor. J Sign Process Syst 94, 263–281 (2022). https://doi.org/10.1007/s11265-020-01616-0

Download citation

Received: 03 October 2019
Revised: 20 October 2020
Accepted: 10 November 2020
Published: 04 January 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11265-020-01616-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Visualizing and Understanding Convolutional Networks

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Visualizing and Understanding Convolutional Networks

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation