Efficient Design of Pruned Convolutional Neural Networks on FPGA

Véstias, Mário

doi:10.1007/s11265-020-01606-2

Efficient Design of Pruned Convolutional Neural Networks on FPGA

Published: 14 November 2020

Volume 93, pages 531–544, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Mário Véstias ORCID: orcid.org/0000-0001-8556-4507¹

643 Accesses
9 Citations
Explore all metrics

Abstract

Convolutional Neural Networks (CNNs) have improved several computer vision applications, like object detection and classification, when compared to other machine learning algorithms. Running these models in edge computing devices close to data sources is attracting the attention of the community since it avoids high-latency data communication of private data for cloud processing and permits real-time decisions turning these systems into smart embedded devices. Running these models is computationally very demanding and requires a large amount of memory, which are scarce in edge devices compared to a cloud center. In this paper, we proposed an architecture for the inference of pruned convolutional neural networks in any density FPGAs. A configurable block pruning method is proposed together with an architecture that supports the efficient execution of pruned networks. Also, pruning and batching are studied together to determine how they influence each other. With the proposed architecture, we run the inference of a CNN with an average performance of 322 GOPs for 8-bit data in a XC7Z020 FPGA. The proposed architecture running AlexNet processes 240 images/s in a ZYNQ7020 and 775 images/s in a ZYNQ7045 with only 1.2% accuracy degradation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Article 12 June 2020

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

References

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y.
Article MathSciNet Google Scholar
Cun, Y. L., Jackel, L. D., Boser, B., Denker, J. S., Graf, H. P., Guyon, I., Henderson, D., Howard, R. E., & Hubbard, W. (1989). Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Communications Magazine, 27(11), 41–46. https://doi.org/10.1109/35.41400.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (pp. 1097–1105). USA: NIPS’12, Curran Associates Inc.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).
Véstias, M. (2020). Deep learning on edge: Challenges and trends. In Rodrigues, J. M., Cardoso, P. J., Monteiro, J., & Ramos, C. M. (Eds.) Smart Systems Design, Applications, and Challenges (pp. 23–42): IGI Global.
Véstias, M. P., Duarte, R. P., deSousa, J. T., & Neto, H. (2018). Lite-cnn: A high-performance architecture to execute cnns in low density fpgas. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.
Gysel, P., Pimentel, J., Motamedi, M., & Ghiasi, S. (2018). Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2018.2808319.
Véstias, M. (2020). Processing systems for deep learning inference on edge devices. In Mastorakis, G., Mavromoustakis, C. X., Batalla, J. M., & Pallis, E. (Eds.) Convergence of Artificial Intelligence and the Internet of Things (pp. 213–240). Cham: Springer International Publishing.
Google: Edge TPU. (2019) https://cloud.google.com/edge-tpu/.
Coral: EDGE TPU Performance Benchmarks. (2020) https://coral.ai/docs/edgetpu/benchmarks.
Mário, V., Lopes, J. D., Véstias, M., & deSousa, J. T. (2020). Implementing cnns using a linear array of full mesh cgras. In Rincón, F., Barba, J., So, H. K. H., Diniz, P., & Caba, J. (Eds.) Applied Reconfigurable Computing. Architectures, Tools, and Applications (pp. 288–297). Cham: Springer International Publishing.
Chakradhar, S., Sankaradas, M., Jakkula, V., & Cadambi, S. (June 2010). A dynamically configurable coprocessor for convolutional neural networks. SIGARCH Comput. Archit. News, 38(3), 247–257. https://doi.org/10.1145/1816038.1815993.
Article Google Scholar
Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., & Temam, O. (2014). Dadiannao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 609–622).
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015). Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’15 (pp. 161–170). New York: ACM.
Liu, B., Zou, D., Feng, L., Feng, S., Fu, P., & Li, J. (2019). An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics, 8(3), 18.
Google Scholar
Rivera-Acosta, M., Ortega-Cisneros, S., & Rivera, J. (2019). Automatic tool for fast generation of custom convolutional neural networks accelerators for fpga. Electronics, 8(6), 17.
Article Google Scholar
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., Wang, Y., & Yang, H. (2016). Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’16 (pp. 26–35). New York: ACM.
Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J. S., & Cao, Y. (2016). Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’16 (pp. 16–25). New York: ACM.
Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., & Zhang, C. (2017). Fpga-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurrency and Computation: Practice and Experience, 29(20), e3850–n/a. https://doi.org/10.1002/cpe.3850,cpe.3850.
Article Google Scholar
Liu, Z., Dou, Y., Jiang, J., Xu, J., Li, S., Zhou, Y., & Xu, Y. (July 2017). Throughput-optimized fpga accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst., 10 (3), 17:1–17:23. https://doi.org/10.1145/3079758.
Article Google Scholar
Alwani, M., Chen, H., Ferdman, M., & Milder, P. (2016). Fused-layer cnn accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1–12).
Shen, Y., Ferdman, M., & Milder, P. (2017). Maximizing cnn accelerator efficiency through resource partitioning. SIGARCH Comput. Archit. News, 45(2), 535–547. https://doi.org/10.1145/3140659.3080221.
Article Google Scholar
Gonçalves, A., Peres, T., & Véstias, M. (2019). Exploring data bitwidth to run convolutional neural networks in low density fpgas. In Hochberger, C., Nelson, B., Koch, A., Woods, R., & Diniz, P. (Eds.) Applied Reconfigurable Computing (pp. 387–401). Cham: Springer International Publishing.
Gysel, P., Motamedi, M., & Ghiasi, S. (2016). Hardware-oriented approximation of convolutional neural networks. In Proceedings of the 4th International Conference on Learning Representations.
Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., & Chen, D. (2018). A design flow of accelerating hybrid extremely low bit-width neural network in embedded fpga. In 28th International Conference on Field-Programmable Logic and Applications.
Véstias, M. P., Duarte, R. P., De Sousa, J. T., & Neto, H. C. (2020). A configurable architecture for running hybrid convolutional neural networks in low-density fpgas. IEEE Access, 8, 107229–107243.
Article Google Scholar
Umuroglu, Y., Fraser, N. J., Gambardella, G., Blott, M., Leong, P., Jahre, M., & Vissers, K. (2017). Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17. (pp. 65–74). New York: ACM. https://doi.org/10.1145/3020078.3021744
Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, arXiv:1510.00149.
Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., & Mahlke, S. (June 2017). Scalpel: Customizing dnn pruning to the underlying hardware parallelism. SIGARCH Comput. Archit. News, 45(2), 548–560. https://doi.org/10.1145/3140659.3080215.
Article Google Scholar
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N. E., & Moshovos, A. (2016). Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (pp. 1–13).
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). Eie: Efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (pp. 243–254).
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S. W., & Dally, W. J. (June 2017). Scnn: An accelerator for compressed-sparse convolutional neural networks. SIGARCH Comput. Archit. News, 45(2), 27–40. https://doi.org/10.1145/3140659.3080254.
Article Google Scholar
Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong GeeHock, J., Liew, Y. T., Srivatsan, K., Moss, D., Subhaschandra, S., & Boudoukh, G. (2017). Can fpgas beat gpus in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17. https://doi.org/10.1145/3020078.3021740 (pp. 5–14). New York: ACM.
Aimar, A., Mostafa, H., Calabrese, E., Rios-Navarro, A., Tapiador-Morales, R., Lungu, I., Milde, M.B., Corradi, F., Linares-Barranco, A., Liu, S., & Delbruck, T. (2019). Nullhop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Transactions on Neural Networks and Learning Systems, 30(3), 644–656. https://doi.org/10.1109/TNNLS.2018.2852335.
Article Google Scholar
Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., & Chen, Y. (2016). Cambricon-x: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1–12).
Lu, L., Xie, J., Huang, R., Zhang, J., Lin, W., & Liang, Y. (2019). An efficient hardware accelerator for sparse convolutional neural networks on fpgas. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (pp 17–25).
Véstias, M. P., Duarte, R. P., deSousa, J. T., & Neto, H. C. (2019). Fast convolutional neural networks in low density fpgas using zero-skipping and weight pruning. Electronics (8), 11. https://doi.org/10.3390/electronics8111321.
Véstias, M., Duarte, R., Sousa, J. T. D., & Neto, H. (2020). Moving deep learning to the edge. Algorithms, 13, 125.
Article MathSciNet Google Scholar
Venieris, S. I., & Bouganis, C. (2018). fpgaconvnet: Mapping regular and irregular convolutional neural networks on fpgas. IEEE Transactions on Neural Networks and Learning Systems, 1–17. https://doi.org/10.1109/TNNLS.2018.2844093.
Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., & Yang, H. (2018). Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(1), 35–47. https://doi.org/10.1109/TCAD.2017.2705069.
Article Google Scholar
Gong, L., Wang, C., Li, X., Chen, H., & Zhou, X. (2018). Maloc: A fully pipelined fpga accelerator for convolutional neural networks with all layers mapped on chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11), 2601–2612. https://doi.org/10.1109/TCAD.2018.2857078.
Article Google Scholar
Véstias, M. P., Duarte, R. P., de Sousa, JT, & Neto, H. C. (2020). A fast and scalable architecture to run convolutional neural networks in low density fpgas. Microprocessors and Microsystems, 77, 103136.
Article Google Scholar
Peres, T., Gonçalves, A., & Véstias, M. (2019). Faster convolutional neural networks in low density fpgas using block pruning. In Hochberger, C., Nelson, B., Koch, A., Woods, R., & Diniz, P. (Eds.) Applied Reconfigurable Computing (pp. 402–416). Cham: Springer International Publishing.
Struharik, R. J. R., Vukobratović, B. Z., Erdeljan, A. M., & Rakanović, D. M. (2020). Conna-hardware accelerator for compressed convolutional neural networks. Microprocessors and Microsystems, 73, 102991.
Article Google Scholar
Véstias, M. (2021). Convolutional neural network. In Khosrow-Pour, D. B. A. M. (Ed.) Encyclopedia of Information Science and Technology, Fifth Edition (pp. 12–26): IGI Global.
Wang, Y., Xu, J., Han, Y., Li, H., & Li, X. (2016). Deepburning: Automatic generation of fpga-based learning accelerators for the neural network family. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) (pp. 1–6).
Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., Mishra, A., & Esmaeilzadeh, H. (2016). From high-level deep neural models to fpgas. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1–12).
Zhang, M., Li, L., Wang, H., Liu, Y., Qin, H., & Zhao, W. (2019). Optimized compression for implementing convolutional neural networks on fpga. Electronics, 8(3), 295. https://doi.org/10.3390/electronics8030295.
Article Google Scholar

Download references

Acknowledgments

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UIDB/50021/2020 and was also supported by project IPL/IDI&CA/2020/TRAINEE/ISEL through Instituto Politécnico de Lisboaa.

Author information

Authors and Affiliations

INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Lisbon, Portugal
Mário Véstias

Authors

Mário Véstias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mário Véstias.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Véstias, M. Efficient Design of Pruned Convolutional Neural Networks on FPGA. J Sign Process Syst 93, 531–544 (2021). https://doi.org/10.1007/s11265-020-01606-2

Download citation

Received: 21 April 2020
Revised: 21 April 2020
Accepted: 08 October 2020
Published: 14 November 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11265-020-01606-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Design of Pruned Convolutional Neural Networks on FPGA

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Design of Pruned Convolutional Neural Networks on FPGA

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation