Skip to main content
Log in

SlimYOLOv4: lightweight object detector based on YOLOv4

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Object detection is a valuable but challenging technology in computer vision research. Although existing methods could attain satisfactory results on high-performance computers, but the huge number of network parameters brings great operating pressure to the mobile devices with limited computing power. Existing methods are usually in a dilemma between accuracy and speed. The low detection effect brings great difficulties to the implementation of detection tasks. This paper optimizes the classic YOLOv4 and proposes the SlimYOLOv4 network structure. Firstly, we change the feature extraction network from CSPDarknet53 to MobileNetV2. Secondly, more appropriate DO-DConv (depthwise over-parameterized depthwise convolutional layer) and DSC (depthwise separable convolution) were selected to replace the standard convolution in the network structure, which greatly reduces computation and improves network performance. Finally, Leaky ReLU is replaced by ReLU6 to improve the numerical resolution. We evaluate SlimYOLOv4 on Pascal VOC07+12 dataset and MS COCO dataset. The experimental results demonstrate that the parameters of our method account for only 12.6\(\%\) of YOLOv4, and the speed is 1.59 times that of YOLOv4, reaching 60.19 frames per second (FPS), which is suitable for real-time detection. It achieve 70.83\(\%\) mean average precision (mAP) on PASCAL VOC07+12 and 29.2\(\%\) mAP on the MS COCO dataset. As a lightweight object detector, it takes into account both speed and accuracy, which can be comparable to the state-of-the-art detectors as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Algabri, M., Mathkour, H., Bencherif, M.A., Alsulaiman, M., Mekhtiche, M.A.: Towards deep object detection techniques for phoneme recognition. IEEE Access 8, 54663–54680 (2020). https://doi.org/10.1109/ACCESS.2020.2980452

    Article  Google Scholar 

  2. Balasundaram, A., Chellappan, C.: An intelligent video analytics model for abnormal event detection in online surveillance video. J. Real Time Image Process. 17(4), 915–930 (2020). https://doi.org/10.1007/s11554-018-0840-6

    Article  Google Scholar 

  3. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint. arXiv:2004.10934 (2020)

  4. Cao, J., Li, Y., Sun, M., Chen, Y., Lischinski, D., Cohen-Or, D., Chen, B., Tu, C.: Do-conv: Depthwise over-parameterized convolutional layer. arXiv preprint. arXiv:2006.12030 (2020)

  5. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. arXiv:1610.02357v3 (2017)

  6. Dai, X., Wan, A., Zhang, P., Wu, B., He, Z., Wei, Z., Chen, K., Tian, Y., Yu, M., Vajda, P., et al.: Fbnetv3: joint architecture-recipe search using neural acquisition function. arXiv e-prints, pp. arXiv–2006. arXiv:2006.02049 (2020)

  7. Gao, Y., Xiao, G.: Real-time chinese traffic warning signs recognition based on cascade and cnn. J. Real Time Image Process. 18(3), 669–680 (2021). https://doi.org/10.1007/s11554-020-01003-9

    Article  Google Scholar 

  8. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448. arXiv:1504.08083 (2015)

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81

  10. Han, J., Yang, Y.: L-net: lightweight and fast object detector-based shufflenetv2. J. Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01145-4

    Article  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. arXiv:1512.03385 (2016)

  13. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861 (2017)

  14. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  15. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243

  16. Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310–7311 (2017). https://doi.org/10.1109/CVPR.2017.351

  17. Huang, R., Pedoeem, J., Chen, C.: Yolo-lite: a real-time object detection algorithm optimized for non-gpu computers. In: 2018 IEEE international conference on Big Data (Big Data), pp. 2503–2510. IEEE (2018). https://doi.org/10.1109/BigData.2018.8621865

  18. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint. arXiv:1602.07360 (2016)

  19. Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint. arXiv:1405.3866 (2014)

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  21. Kyrkou, C., Plastiras, G., Theocharides, T., Venieris, S.I., Bouganis, C.S.: Dronet: Efficient convolutional neural network detector for real-time uav applications. In: 2018 Design, automation and test in Europe conference and exhibition (DATE), pp. 967–972. IEEE (2018). https://doi.org/10.23919/DATE.2018.8342149

  22. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  23. Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 6(3), 417–431 (2016). https://doi.org/10.1109/TETC.2016.2593643

    Article  Google Scholar 

  24. Mao, Y., He, Z., Ma, Z., Tang, X., Wang, Z.: Efficient convolution neural networks for object tracking using separable convolution and filter pruning. IEEE Access 7, 106466–106474 (2019). https://doi.org/10.1109/ACCESS.2019.2932733

    Article  Google Scholar 

  25. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  26. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017). https://doi.org/10.1109/CVPR.2017.690

  27. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint. arXiv:1804.02767 (2018)

  28. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  29. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474

  30. Shafiee, M.J., Chywl, B., Li, F., Wong, A.: Fast yolo: a fast you only look once system for real-time embedded object detection in video. arXiv preprint. arXiv:1709.05943 (2017)

  31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 (2014)

  32. Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4234–4243 (2019). https://doi.org/10.1109/CVPR.2019.00436

  33. Wan, A., Dai, X., Zhang, P., He, Z., Tian, Y., Xie, S., Wu, B., Yu, M., Xu, T., Chen, K., et al.: Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12965–12974 (2020). https://doi.org/10.1109/CVPR42600.2020.01298

  34. Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391 (2020). https://doi.org/10.1109/CVPRW50498.2020.00203

  35. Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., Keutzer, K.: Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10734–10742 (2019). https://doi.org/10.1109/CVPR.2019.01099

  36. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500 (2017). https://doi.org/10.1109/CVPR.2017.634

  37. Zhang, X., Xie, H., Zhao, Y., Qian, W., Xu, X.: A fast ssd model based on parameter reduction and dilated convolution. J Real Time Image Process. (2021). https://doi.org/10.1007/s11554-021-01108-9

    Article  Google Scholar 

  38. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848–6856. arXiv:1707.01083 (2018)

  39. Zhang, Y., Song, C., Zhang, D.: Deep learning-based object detection improvement for tomato disease. IEEE Access 8, 56607–56614 (2020). https://doi.org/10.1109/ACCESS.2020.2982456

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Key-Area Research and Development Program of Guangdong Province under Grant 2020B0909020001, the National Natural Science Foundation of China under Grant No.61573113.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaming Qian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, P., Qian, H. & Chu, S. SlimYOLOv4: lightweight object detector based on YOLOv4. J Real-Time Image Proc 19, 487–498 (2022). https://doi.org/10.1007/s11554-022-01201-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-022-01201-7

Keywords

Navigation