SlimYOLOv4: lightweight object detector based on YOLOv4

Ding, Peng; Qian, Huaming; Chu, Shuai

doi:10.1007/s11554-022-01201-7

SlimYOLOv4: lightweight object detector based on YOLOv4

Original Research Paper
Published: 10 February 2022

Volume 19, pages 487–498, (2022)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Peng Ding¹,
Huaming Qian¹ &
Shuai Chu¹

980 Accesses
13 Citations
Explore all metrics

Abstract

Object detection is a valuable but challenging technology in computer vision research. Although existing methods could attain satisfactory results on high-performance computers, but the huge number of network parameters brings great operating pressure to the mobile devices with limited computing power. Existing methods are usually in a dilemma between accuracy and speed. The low detection effect brings great difficulties to the implementation of detection tasks. This paper optimizes the classic YOLOv4 and proposes the SlimYOLOv4 network structure. Firstly, we change the feature extraction network from CSPDarknet53 to MobileNetV2. Secondly, more appropriate DO-DConv (depthwise over-parameterized depthwise convolutional layer) and DSC (depthwise separable convolution) were selected to replace the standard convolution in the network structure, which greatly reduces computation and improves network performance. Finally, Leaky ReLU is replaced by ReLU6 to improve the numerical resolution. We evaluate SlimYOLOv4 on Pascal VOC07+12 dataset and MS COCO dataset. The experimental results demonstrate that the parameters of our method account for only 12.6\(\%\) of YOLOv4, and the speed is 1.59 times that of YOLOv4, reaching 60.19 frames per second (FPS), which is suitable for real-time detection. It achieve 70.83\(\%\) mean average precision (mAP) on PASCAL VOC07+12 and 29.2\(\%\) mAP on the MS COCO dataset. As a lightweight object detector, it takes into account both speed and accuracy, which can be comparable to the state-of-the-art detectors as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

End-to-End Object Detection with Transformers

References

Algabri, M., Mathkour, H., Bencherif, M.A., Alsulaiman, M., Mekhtiche, M.A.: Towards deep object detection techniques for phoneme recognition. IEEE Access 8, 54663–54680 (2020). https://doi.org/10.1109/ACCESS.2020.2980452
Article Google Scholar
Balasundaram, A., Chellappan, C.: An intelligent video analytics model for abnormal event detection in online surveillance video. J. Real Time Image Process. 17(4), 915–930 (2020). https://doi.org/10.1007/s11554-018-0840-6
Article Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint. arXiv:2004.10934 (2020)
Cao, J., Li, Y., Sun, M., Chen, Y., Lischinski, D., Cohen-Or, D., Chen, B., Tu, C.: Do-conv: Depthwise over-parameterized convolutional layer. arXiv preprint. arXiv:2006.12030 (2020)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. arXiv:1610.02357v3 (2017)
Dai, X., Wan, A., Zhang, P., Wu, B., He, Z., Wei, Z., Chen, K., Tian, Y., Yu, M., Vajda, P., et al.: Fbnetv3: joint architecture-recipe search using neural acquisition function. arXiv e-prints, pp. arXiv–2006. arXiv:2006.02049 (2020)
Gao, Y., Xiao, G.: Real-time chinese traffic warning signs recognition based on cascade and cnn. J. Real Time Image Process. 18(3), 669–680 (2021). https://doi.org/10.1007/s11554-020-01003-9
Article Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448. arXiv:1504.08083 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Han, J., Yang, Y.: L-net: lightweight and fast object detector-based shufflenetv2. J. Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01145-4
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. arXiv:1512.03385 (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861 (2017)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310–7311 (2017). https://doi.org/10.1109/CVPR.2017.351
Huang, R., Pedoeem, J., Chen, C.: Yolo-lite: a real-time object detection algorithm optimized for non-gpu computers. In: 2018 IEEE international conference on Big Data (Big Data), pp. 2503–2510. IEEE (2018). https://doi.org/10.1109/BigData.2018.8621865
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint. arXiv:1602.07360 (2016)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint. arXiv:1405.3866 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kyrkou, C., Plastiras, G., Theocharides, T., Venieris, S.I., Bouganis, C.S.: Dronet: Efficient convolutional neural network detector for real-time uav applications. In: 2018 Design, automation and test in Europe conference and exhibition (DATE), pp. 967–972. IEEE (2018). https://doi.org/10.23919/DATE.2018.8342149
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 6(3), 417–431 (2016). https://doi.org/10.1109/TETC.2016.2593643
Article Google Scholar
Mao, Y., He, Z., Ma, Z., Tang, X., Wang, Z.: Efficient convolution neural networks for object tracking using separable convolution and filter pruning. IEEE Access 7, 106466–106474 (2019). https://doi.org/10.1109/ACCESS.2019.2932733
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint. arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474
Shafiee, M.J., Chywl, B., Li, F., Wong, A.: Fast yolo: a fast you only look once system for real-time embedded object detection in video. arXiv preprint. arXiv:1709.05943 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 (2014)
Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4234–4243 (2019). https://doi.org/10.1109/CVPR.2019.00436
Wan, A., Dai, X., Zhang, P., He, Z., Tian, Y., Xie, S., Wu, B., Yu, M., Xu, T., Chen, K., et al.: Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12965–12974 (2020). https://doi.org/10.1109/CVPR42600.2020.01298
Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391 (2020). https://doi.org/10.1109/CVPRW50498.2020.00203
Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10734–10742 (2019). https://doi.org/10.1109/CVPR.2019.01099
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500 (2017). https://doi.org/10.1109/CVPR.2017.634
Zhang, X., Xie, H., Zhao, Y., Qian, W., Xu, X.: A fast ssd model based on parameter reduction and dilated convolution. J Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01108-9
Article Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856. arXiv:1707.01083 (2018)
Zhang, Y., Song, C., Zhang, D.: Deep learning-based object detection improvement for tomato disease. IEEE Access 8, 56607–56614 (2020). https://doi.org/10.1109/ACCESS.2020.2982456
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B0909020001, the National Natural Science Foundation of China under Grant No.61573113.

Author information

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, China
Peng Ding, Huaming Qian & Shuai Chu

Authors

Peng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Huaming Qian
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Chu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaming Qian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, P., Qian, H. & Chu, S. SlimYOLOv4: lightweight object detector based on YOLOv4. J Real-Time Image Proc 19, 487–498 (2022). https://doi.org/10.1007/s11554-022-01201-7

Download citation

Received: 08 September 2021
Accepted: 13 January 2022
Published: 10 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11554-022-01201-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SlimYOLOv4: lightweight object detector based on YOLOv4

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SlimYOLOv4: lightweight object detector based on YOLOv4

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation