Abstract
According to the problem that the multi-scale vehicle objects in traffic surveillance video are difficult to detect and the overlapping objects are prone to missed detection, an improved vehicle object detection method based on YOLOv3 was proposed. In order to extract feature more efficiently, we first use the inverted residuals technique to improve the convolutional layer of YOLOv3. To solve the multi-scale vehicle object detection problem, three spatial pyramid pooling(SPP) modules are added before each YOLO layer to obtain multi-scale information. In order to cope with the overlapping of vehicles in traffic videos, soft non maximum suppression (Soft-NMS) is used to replace non maximum suppression (NMS), thereby reducing the missing of predicted boxes due to vehicle overlaps. Our experiment results in the Car dataset and the KITTI dataset confirm that the proposed method achieves good detection results for vehicle objects of various scales in various scenes. Our method can meet the needs of practical applications better.
Similar content being viewed by others
References
Gao T, Liu Z, Yue S, Zhang J (2010) Moving vehicle tracking algorithm used for intelligent traffic China. J Highway Transport 23(3):89–94
Teoh SS, Bräunl T (2012) Symmetry-based monocular vehicle detection system. Mach Vis Appl 23:831–842
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained multiscale deformable part model. IEEE conference on computer vision and pattern recognition (CVPR)
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Conf Computer Vision and Pattern Recognition
Karaimer H, Baris BY (2017) Detection and classification of vehicles from omnidirectional videos using multiple silhouettes. Pattern Anal Applic 20(3):893–905
Ershadi N, Menéndez J, Jiménez D (2018) Robust vehicle detection in different weather conditions: using MIPM. PLoS One 13:e0191355
Mao Q, Sun H, Liu Y (2019) Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access 7:133529–133538
Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowl-Based Syst
Liu X, Jia R et al (2019) Coastline extraction method based on convolutional neural networks—a case study of Jiaozhou Bay in Qingdao, China. IEEE Access 7:180281–180291
Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method. Knowledge-Based Systems 105590
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. arXiv preprint arXiv:1908.10009
Zhang Y, Zhou Y, Lu H, Fujita H (2020) Traffic network flow prediction using parallel training for deep convolutional neural networks on spark cloud. IEEE Transactions on Industrial Informatics. https://doi.org/10.1109/TII.2020.2976053
Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2019) Siamese attentional keypoint network for high performance visual tracking. Knowl-Based Syst
Zhou Y et al (2019) Train-movement situation recognition for safety justification using moving-horizon TBM-based multisensor data fusion. Knowl-Based Syst 177:117–126
Kwangyong L, Hong Y, Yeongwoo C, Hyeran B (2017) Real-time traffic sign recognition based on a general purpose gpu and deep-learning. PLoS One 12(3):e0173317
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, et al (1989) Backpropagation applied to handwritten zip code recognition. Neural computation 1(4):541–551
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR):580–587
Everingham M, Eslami A, Gool L, Williams C, Winn J (2014) The PASCAL visual object classes challenge a retrospective. Int. J, Comput
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Girshick R (2015) Fast R-CNN. IEEE international conference on computer vision (ICCV) 1440-1448
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems:91–99
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. Proc IEEE international conference on computer vision (ICCV) 2961-2969
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified real-time object detection. IEEE conference on computer vision and pattern recognition (CVPR) 779-788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S et al (2016) SSD: single shot multibox detector. European Conference on Computer Vision:21–37
Redmon J, Farhadi A (2017) YOLO9000: better faster stronger. IEEE Conference on Computer Vision and Pattern Recognition (CVPR):7263–7271
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. [online] Available: https://arxiv.org/abs/1804.02767
Pérez H, Francisco et al (2020) Object Detection Binary Classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance Knowledge-Based Systems 105590
Neubeck A, Vangool L, (2006) Efficient non-maximum suppression. 18th international conference on pattern recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR):770–778
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: inverted residuals and linear bottlenecks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Francois C (2016) Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the KITTI dataset. Int J Robot Res 11:1231–1237
Rajaram RN, Ohnbar E, Trivedi MM (2016) RefineNet: refining object detectors for autonomous driving. IEEE Transactions on Intelligent Vehicles 1(4):358–368
Zhao Q, Sheng T, Wang Y, Ni F, Cai L (2018) Cfenet: an accurate and efficient single-shot object detector for autonomous driving. CoRR, abs/1806.09790
Qin Z, Wang J, Lu Y (2018) Monogrnet: a geometric reasoning network for monocular 3d object localization. Proceedings of the AAAI Conference on Artificial Intelligence 33:8851–8858
Wang Z, Jia K (2019) Frustum ConvNet: sliding frustums to aggregate local point-wise features for Amodal 3D object detection. arXiv preprint arXiv:1903.01864
Shi S, Wang X, and Li H (2019) PointRCNN: 3D object proposal generation and detection from point cloud. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–779
Acknowledgements
The authors are grateful for collaborative funding support from the Natural Science Foundation of Shandong Province, China (ZR2018MEE008), the Key Research and Development Program of Shandong Province, China (2017GSF20115).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mao, QC., Sun, HM., Zuo, LQ. et al. Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl Intell 50, 3125–3136 (2020). https://doi.org/10.1007/s10489-020-01704-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01704-5