Skip to main content
Log in

Learning efficient single stage pedestrian detection by squeeze-and-excitation network

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Pedestrian detection has a pivotal role in the field of computer vision. Recently, deep convolutional neural networks (CNNs) have been demonstrated to achieve appealing performance in object detection compared to hand-crafted methods, with single shot multiBox detector (SSD) being one of state-of-the-art methods in terms of both speed and accuracy. In this paper, we propose a novel framework which is able to perform pedestrian detection by not only considering local features but also by incorporating global information into features to make them more discriminative for this task. Specifically, we first integrate feature pyramid network into the SSD detection framework. Next, a Squeeze-and-Excitation network is proposed to encode global information. Hence, the features become more focused on pedestrians, in particular those of small scale and with occlusion. We further introduce a network in network fusion module, which enhances the features by incorporating local details. In this way our framework is able to suppress background information and highlights pedestrian elements. Experimental results show that the proposed framework can achieve comparable detection results to state-of-the-art methods and run an average of 17 frames per second (fps) on NVidia TITAN X GPU with image size of \(600\times 600\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Angelova A, Krizhevsky A, Vanhoucke V, Ogale A, Ferguson D (2015) Real-time pedestrian detection with deep network cascades. In: Xie X, Jones MW, Tam GKL (eds) Proceedings of the British machine vision conference (BMVC), pp 32.1–32.12. BMVA Press

  2. Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection segmentation. arXiv:1706.08564

  3. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  4. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham, pp 354–370

  5. Cai Z, Vasconcelos N (2017) Cascade r-CNN: delving into high quality object detection. In: ICCV, pp 4950–4959

  6. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv:1904.11492

  7. Dai J, Li Y, He K, Sun J (2016) R-FCN, Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE conf.comput. vis. pattern recogn., pp 886–893

  9. Du X, El-Khamy M, Lee J, Davis L (2017) Fused DNN: a deep neural network fusion approach to fast and robust pedestrian detection. In: Applications of computer vision (WACV), IEEE winter conference on, pp 953-961

  10. Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: IEEE conference on computer vision and pattern recognition, pp 304–311. https://doi.org/10.1109/CVPR.2009.5206631

  11. Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  12. Ess A, Leibe B, Van Gool L (2007) Depth and appearance for mobile scene analysis. In: IEEE international conference on computer vision (ICCV)

  13. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:1701.06659

  14. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR)

  15. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer vision and pattern recognition (CVPR)

  16. Girshick R (2015) Fast r-CNN, In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  17. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  19. Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In: IEEE conference on computer vision and pattern recognition CVPR

  20. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  21. Hu Q, Wang P, Shen C, van den Hengel A, Porikli F (2018) Pushing the limits of deep CNNs for pedestrian detection. IEEE Trans Circuits Syst Video Technol 28(6):1358–1368. https://doi.org/10.1109/TCSVT.2017.2648850

    Article  Google Scholar 

  22. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  23. Ji Y, Zhang H, Tseng KK, Chow TW, Wu QJ (2019) Graph model-based salient object detection using objectness and multiple saliency cues. Neurocomputing 323:188–202

    Article  Google Scholar 

  24. Ji Y, Zhang H, Jie Z, Ma L, Wu QMJ (2020) Casnet: a cross-attention Siamese network for video salient object detection. IEEE Trans Neural Netw Learn Syst PP(99), 1-15

  25. Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 169–185

  26. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853

  27. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. arXiv:1707.01691

  28. Lee H, Eum S, Kwon H (2017) Me r-CNN: multi-expert region-based CNN for object detection. arXiv:1704.01069

  29. Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimed 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508

    Article  Google Scholar 

  30. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 510–519

  31. Lin YT, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection. arXiv:1612.03144

  32. Lin YT, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. arXiv:1708.02002

  33. Lin C, Lu J, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: The European conference on computer vision (ECCV)

  34. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400

  35. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  36. Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: The European conference on computer vision (ECCV)

  37. Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection? In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 3

  38. Nam W, Dollar P, Han JH (2014) Local decorrelation for improved pedestrian detection. In: Advances in neural information processing systems, pp 424–432

  39. Paisitkriangkrai S, Shen C, den Hengel A (2014) Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Proc. Eur. conf. comput. vis, pp 546–561

  40. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91

  41. Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  42. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Li F-F (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252

    Article  MathSciNet  Google Scholar 

  43. Shrivastava A, Gupta A, Girshick R, Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769

  44. Shrivastava A, Gupta A (2016) Contextual priming and feedback for faster r-CNN. In: European conference on computer vision. Springer, pp 330–348

  45. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: The IEEE international conference on computer vision (ICCV), vol 3, p 7

  46. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large scale image recognition. In: ICLR

  47. Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087

  48. Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 5079–5087. https://doi.org/10.1109/CVPR.2015.7299143

  49. Verbickas R, Laganiere R, Laroche D, Zhu C, Xu X, Ors A (2017) SqueezeMap: fast pedestrian detection on a low-power automotive processor using efficient convolutional neural networks. In: CVPR workshops, pp 463–471

  50. Wang X, Shrivastava A, Gupta A (2017) A-fast-rCNN: hard positive generation via adversary for object detection. arXiv:1704.03414

  51. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2017) Repulsion loss: detecting pedestrians in a crowd. arXiv:1711.07752

  52. Yonglong T, Ping L, Xiaogang W, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912

  53. Yuzhu J, Haijun Z, Zhao Z, Ming L (2021) CNN-based encoder-decoder networks for salient object detection: a comprehensive review and recent advances. Inf Sci 546:835–857

    Article  MathSciNet  Google Scholar 

  54. Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? In: ECCV

  55. Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. arXiv:1702.05693

  56. Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1259–1267

  57. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-CNN: detecting pedestrians in a crowd. In: The European conference on computer vision (ECCV)

  58. Zhou C, Yuan J (2018) Bi-box regression for pedestrian detection and occlusion estimation. In: The European conference on computer vision (ECCV)

Download references

Acknowledgements

We thank the anonymous editor and reviewers for their careful reading and many insightful comments and suggestions.

Funding

This work has been supported by the National Natural Science Foundation of China (61873246).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Wang, Y., Laganière, R. et al. Learning efficient single stage pedestrian detection by squeeze-and-excitation network. Neural Comput & Applic 33, 16697–16712 (2021). https://doi.org/10.1007/s00521-021-06265-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06265-3

Keywords

Navigation