Learning efficient single stage pedestrian detection by squeeze-and-excitation network

Ding, Lu; Wang, Yong; Laganière, Robert; Luo, Xinbin; Huang, Dan; Zhang, Huanlong

doi:10.1007/s00521-021-06265-3

Learning efficient single stage pedestrian detection by squeeze-and-excitation network

Original Article
Published: 07 July 2021

Volume 33, pages 16697–16712, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lu Ding¹,
Yong Wang ORCID: orcid.org/0000-0002-0578-8023²,
Robert Laganière³,
Xinbin Luo⁴,
Dan Huang¹ &
…
Huanlong Zhang⁵

406 Accesses
6 Citations
Explore all metrics

Abstract

Pedestrian detection has a pivotal role in the field of computer vision. Recently, deep convolutional neural networks (CNNs) have been demonstrated to achieve appealing performance in object detection compared to hand-crafted methods, with single shot multiBox detector (SSD) being one of state-of-the-art methods in terms of both speed and accuracy. In this paper, we propose a novel framework which is able to perform pedestrian detection by not only considering local features but also by incorporating global information into features to make them more discriminative for this task. Specifically, we first integrate feature pyramid network into the SSD detection framework. Next, a Squeeze-and-Excitation network is proposed to encode global information. Hence, the features become more focused on pedestrians, in particular those of small scale and with occlusion. We further introduce a network in network fusion module, which enhances the features by incorporating local details. In this way our framework is able to suppress background information and highlights pedestrian elements. Experimental results show that the proposed framework can achieve comparable detection results to state-of-the-art methods and run an average of 17 frames per second (fps) on NVidia TITAN X GPU with image size of \(600\times 600\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

R-SSD: refined single shot multibox detector for pedestrian detection

Article 14 January 2022

Chaoqi Yan, Hong Zhang, … Ding Yuan

Improved SSD-Based Multi-scale Pedestrian Detection Algorithm

CSSD: An End-to-End Deep Neural Network Approach to Pedestrian Detection

References

Angelova A, Krizhevsky A, Vanhoucke V, Ogale A, Ferguson D (2015) Real-time pedestrian detection with deep network cascades. In: Xie X, Jones MW, Tam GKL (eds) Proceedings of the British machine vision conference (BMVC), pp 32.1–32.12. BMVA Press
Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection segmentation. arXiv:1706.08564
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham, pp 354–370
Cai Z, Vasconcelos N (2017) Cascade r-CNN: delving into high quality object detection. In: ICCV, pp 4950–4959
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492
Dai J, Li Y, He K, Sun J (2016) R-FCN, Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE conf.comput. vis. pattern recogn., pp 886–893
Du X, El-Khamy M, Lee J, Davis L (2017) Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: Applications of computer vision (WACV), IEEE winter conference on, pp 953-961
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 304–311. https://doi.org/10.1109/CVPR.2009.5206631
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Ess A, Leibe B, Van Gool L (2007) Depth and appearance for mobile scene analysis. In: IEEE international conference on computer vision (ICCV)
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:1701.06659
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR)
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer vision and pattern recognition (CVPR)
Girshick R (2015) Fast r-CNN, In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In: IEEE conference on computer vision and pattern recognition CVPR
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu Q, Wang P, Shen C, van den Hengel A, Porikli F (2018) Pushing the limits of deep CNNs for pedestrian detection. IEEE Trans Circuits Syst Video Technol 28(6):1358–1368. https://doi.org/10.1109/TCSVT.2017.2648850
Article Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Ji Y, Zhang H, Tseng KK, Chow TW, Wu QJ (2019) Graph model-based salient object detection using objectness and multiple saliency cues. Neurocomputing 323:188–202
Article Google Scholar
Ji Y, Zhang H, Jie Z, Ma L, Wu QMJ (2020) Casnet: a cross-attention Siamese network for video salient object detection. IEEE Trans Neural Netw Learn Syst PP(99), 1-15
Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 169–185
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. arXiv:1707.01691
Lee H, Eum S, Kwon H (2017) Me r-CNN: multi-expert region-based CNN for object detection. arXiv:1704.01069
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimed 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508
Article Google Scholar
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 510–519
Lin YT, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection. arXiv:1612.03144
Lin YT, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. arXiv:1708.02002
Lin C, Lu J, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: The European conference on computer vision (ECCV)
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: The European conference on computer vision (ECCV)
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection? In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 3
Nam W, Dollar P, Han JH (2014) Local decorrelation for improved pedestrian detection. In: Advances in neural information processing systems, pp 424–432
Paisitkriangkrai S, Shen C, den Hengel A (2014) Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Proc. Eur. conf. comput. vis, pp 546–561
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Li F-F (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252
Article MathSciNet Google Scholar
Shrivastava A, Gupta A, Girshick R, Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769
Shrivastava A, Gupta A (2016) Contextual priming and feedback for faster r-CNN. In: European conference on computer vision. Springer, pp 330–348
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: The IEEE international conference on computer vision (ICCV), vol 3, p 7
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large scale image recognition. In: ICLR
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 5079–5087. https://doi.org/10.1109/CVPR.2015.7299143
Verbickas R, Laganiere R, Laroche D, Zhu C, Xu X, Ors A (2017) SqueezeMap: fast pedestrian detection on a low-power automotive processor using efficient convolutional neural networks. In: CVPR workshops, pp 463–471
Wang X, Shrivastava A, Gupta A (2017) A-fast-rCNN: hard positive generation via adversary for object detection. arXiv:1704.03414
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2017) Repulsion loss: detecting pedestrians in a crowd. arXiv:1711.07752
Yonglong T, Ping L, Xiaogang W, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912
Yuzhu J, Haijun Z, Zhao Z, Ming L (2021) CNN-based encoder-decoder networks for salient object detection: a comprehensive review and recent advances. Inf Sci 546:835–857
Article MathSciNet Google Scholar
Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? In: ECCV
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. arXiv:1702.05693
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1259–1267
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-CNN: detecting pedestrians in a crowd. In: The European conference on computer vision (ECCV)
Zhou C, Yuan J (2018) Bi-box regression for pedestrian detection and occlusion estimation. In: The European conference on computer vision (ECCV)

Download references

Acknowledgements

We thank the anonymous editor and reviewers for their careful reading and many insightful comments and suggestions.

Funding

This work has been supported by the National Natural Science Foundation of China (61873246).

Author information

Authors and Affiliations

School of Electrical Engineering, Guangxi University, Nanning, Guangxi, China
Lu Ding & Dan Huang
School of Aeronautics and Astronautics, Sun Yat-Sen University, Shenzhen, China
Yong Wang
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Robert Laganière
School of Electronic information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Xinbin Luo
College of Electric and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China
Huanlong Zhang

Authors

Lu Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Robert Laganière
View author publications
You can also search for this author in PubMed Google Scholar
Xinbin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Dan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Huanlong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, L., Wang, Y., Laganière, R. et al. Learning efficient single stage pedestrian detection by squeeze-and-excitation network. Neural Comput & Applic 33, 16697–16712 (2021). https://doi.org/10.1007/s00521-021-06265-3

Download citation

Received: 19 August 2020
Accepted: 26 June 2021
Published: 07 July 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00521-021-06265-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning efficient single stage pedestrian detection by squeeze-and-excitation network

Abstract

Access this article

Similar content being viewed by others

R-SSD: refined single shot multibox detector for pedestrian detection

Improved SSD-Based Multi-scale Pedestrian Detection Algorithm

CSSD: An End-to-End Deep Neural Network Approach to Pedestrian Detection

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning efficient single stage pedestrian detection by squeeze-and-excitation network

Abstract

Access this article

Similar content being viewed by others

R-SSD: refined single shot multibox detector for pedestrian detection

Improved SSD-Based Multi-scale Pedestrian Detection Algorithm

CSSD: An End-to-End Deep Neural Network Approach to Pedestrian Detection

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation