Skip to main content
Log in

Finding every car: a traffic surveillance multi-scale vehicle object detection method

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

According to the problem that the multi-scale vehicle objects in traffic surveillance video are difficult to detect and the overlapping objects are prone to missed detection, an improved vehicle object detection method based on YOLOv3 was proposed. In order to extract feature more efficiently, we first use the inverted residuals technique to improve the convolutional layer of YOLOv3. To solve the multi-scale vehicle object detection problem, three spatial pyramid pooling(SPP) modules are added before each YOLO layer to obtain multi-scale information. In order to cope with the overlapping of vehicles in traffic videos, soft non maximum suppression (Soft-NMS) is used to replace non maximum suppression (NMS), thereby reducing the missing of predicted boxes due to vehicle overlaps. Our experiment results in the Car dataset and the KITTI dataset confirm that the proposed method achieves good detection results for vehicle objects of various scales in various scenes. Our method can meet the needs of practical applications better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Gao T, Liu Z, Yue S, Zhang J (2010) Moving vehicle tracking algorithm used for intelligent traffic China. J Highway Transport 23(3):89–94

    Google Scholar 

  2. Teoh SS, Bräunl T (2012) Symmetry-based monocular vehicle detection system. Mach Vis Appl 23:831–842

    Article  Google Scholar 

  3. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained multiscale deformable part model. IEEE conference on computer vision and pattern recognition (CVPR)

  4. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Conf Computer Vision and Pattern Recognition

    Book  Google Scholar 

  6. Karaimer H, Baris BY (2017) Detection and classification of vehicles from omnidirectional videos using multiple silhouettes. Pattern Anal Applic 20(3):893–905

    Article  MathSciNet  Google Scholar 

  7. Ershadi N, Menéndez J, Jiménez D (2018) Robust vehicle detection in different weather conditions: using MIPM. PLoS One 13:e0191355

    Article  Google Scholar 

  8. Mao Q, Sun H, Liu Y (2019) Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access 7:133529–133538

    Article  Google Scholar 

  9. Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowl-Based Syst

  10. Liu X, Jia R et al (2019) Coastline extraction method based on convolutional neural networks—a case study of Jiaozhou Bay in Qingdao, China. IEEE Access 7:180281–180291

    Article  Google Scholar 

  11. Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method. Knowledge-Based Systems 105590

  12. Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. arXiv preprint arXiv:1908.10009

  13. Zhang Y, Zhou Y, Lu H, Fujita H (2020) Traffic network flow prediction using parallel training for deep convolutional neural networks on spark cloud. IEEE Transactions on Industrial Informatics. https://doi.org/10.1109/TII.2020.2976053

  14. Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2019) Siamese attentional keypoint network for high performance visual tracking. Knowl-Based Syst

  15. Zhou Y et al (2019) Train-movement situation recognition for safety justification using moving-horizon TBM-based multisensor data fusion. Knowl-Based Syst 177:117–126

    Article  Google Scholar 

  16. Kwangyong L, Hong Y, Yeongwoo C, Hyeran B (2017) Real-time traffic sign recognition based on a general purpose gpu and deep-learning. PLoS One 12(3):e0173317

    Article  Google Scholar 

  17. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    Article  Google Scholar 

  18. LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, et al (1989) Backpropagation applied to handwritten zip code recognition. Neural computation 1(4):541–551

  19. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  20. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR):580–587

  21. Everingham M, Eslami A, Gool L, Williams C, Winn J (2014) The PASCAL visual object classes challenge a retrospective. Int. J, Comput

    Google Scholar 

  22. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  23. Girshick R (2015) Fast R-CNN. IEEE international conference on computer vision (ICCV) 1440-1448

  24. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems:91–99

  25. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. Proc IEEE international conference on computer vision (ICCV) 2961-2969

  26. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified real-time object detection. IEEE conference on computer vision and pattern recognition (CVPR) 779-788

  27. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S et al (2016) SSD: single shot multibox detector. European Conference on Computer Vision:21–37

  28. Redmon J, Farhadi A (2017) YOLO9000: better faster stronger. IEEE Conference on Computer Vision and Pattern Recognition (CVPR):7263–7271

  29. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. [online] Available: https://arxiv.org/abs/1804.02767

  30. Pérez H, Francisco et al (2020) Object Detection Binary Classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance Knowledge-Based Systems 105590

  31. Neubeck A, Vangool L, (2006) Efficient non-maximum suppression. 18th international conference on pattern recognition

  32. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR):770–778

  33. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: inverted residuals and linear bottlenecks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  34. Francois C (2016) Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357

  35. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the KITTI dataset. Int J Robot Res 11:1231–1237

    Article  Google Scholar 

  36. Rajaram RN, Ohnbar E, Trivedi MM (2016) RefineNet: refining object detectors for autonomous driving. IEEE Transactions on Intelligent Vehicles 1(4):358–368

    Article  Google Scholar 

  37. Zhao Q, Sheng T, Wang Y, Ni F, Cai L (2018) Cfenet: an accurate and efficient single-shot object detector for autonomous driving. CoRR, abs/1806.09790

  38. Qin Z, Wang J, Lu Y (2018) Monogrnet: a geometric reasoning network for monocular 3d object localization. Proceedings of the AAAI Conference on Artificial Intelligence 33:8851–8858

    Article  Google Scholar 

  39. Wang Z, Jia K (2019) Frustum ConvNet: sliding frustums to aggregate local point-wise features for Amodal 3D object detection. arXiv preprint arXiv:1903.01864

  40. Shi S, Wang X, and Li H (2019) PointRCNN: 3D object proposal generation and detection from point cloud. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–779

Download references

Acknowledgements

The authors are grateful for collaborative funding support from the Natural Science Foundation of Shandong Province, China (ZR2018MEE008), the Key Research and Development Program of Shandong Province, China (2017GSF20115).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hong-Mei Sun or Rui-Sheng Jia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, QC., Sun, HM., Zuo, LQ. et al. Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl Intell 50, 3125–3136 (2020). https://doi.org/10.1007/s10489-020-01704-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01704-5

Keywords

Navigation