Skip to main content
Log in

Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Benefit from multi-scale feature pyramid methods, recently single-stage object detectors have achieved promising accuracy and fast inference speed. However, the majority of existing feature pyramid detection techniques only simply describe complex contextual relationships from different scales. Not only are there no effective modules that adaptively extend appropriate semantic information from deeper layers, but the finer spatial localization cues from lower layers are often ignored. In this paper, we present a Local Enhancement and Bidirectional Feature Refinement Network (LFBFR), which includes two optimization methods to achieve remarkable improvements in detection accuracy. Firstly, to make the backbone more suitable for detection task, we modify the pre-trained classification backbone to mitigate the loss of details in small objects due to consecutive decrease of the image resolution. Then we propose a Bidirectional Feature Refinement Pyramid, which can effectively utilize the inter-channel relationship of higher-level features and fine appearance cues from lower-level features by using the attention residual refinement module and the feature reuse module. Ultimately, to assess the performance of the proposed LFBFR, we design a powerful end-to-end single-stage detector called LFBFR-SSD by embedding it into the framework of SSD. Extensive experiments on the PASCAL VOC and MS COCO verify that our LFBFR-SSD outperforms a lot of state-of-the-art detectors while maintaining a real-time speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM. Pyramid methods in image processing. RCA engineer. 1984;29(6):33–41.

    Google Scholar 

  2. LeCun Y, Bengio Y, et al. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks. 1995;3361(10):

  3. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2014. pp. 580–587.

  4. Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. 2015. pp. 1440–1448.

  5. Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. In: Adv Neural Inf Proces Syst. 2015. pp 91–99.

  6. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. Ssd: Single shot multibox detector. In: European Conference on Computer Vision, Springer 2016. pp. 21–37.

  7. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 779–788.

  8. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint 2014. arXiv:14091556

  9. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 2117–2125.

  10. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC. Dssd: Deconvolutional single shot detector. arXiv preprint 2017. arXiv:170106659

  11. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X. Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 1919–1927.

  12. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y. Ron: Reverse connection with objectness prior networks for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 5936–5944.

  13. Woo S, Hwang S, Kweon IS. Stairnet: Top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE 2018, pp. 1093–1102.

  14. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 770–778.

  15. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. Int J Comput Vis. 2010;88(2):303–38.

    Article  Google Scholar 

  16. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL.  Microsoft coco: Common objects in context. In: European Conference on Computer Vision, Springer 2014. pp. 740–755.

  17. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint 2013. arXiv:13126229

  18. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW. Selective search for object recognition. Int J Comput Vis. 2013;104(2):154–71.

    Article  Google Scholar 

  19. Zitnick CL, Dollár P. Edge boxes: Locating object proposals from edges. In: European Conference on Computer Vision, Springer 2014. pp 391–405.

  20. He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

    Article  Google Scholar 

  21. Cai Z, Fan Q, Feris RiS, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, Springer 2016. pp. 354–370.

  22. Shrivastava A, Sukthankar R, Malik J, Gupta A. Beyond skip connections: Top-down modulation for object detection. arXiv preprint 2016. arXiv:161206851

  23. Zhang S, Wen L, Bian X, Lei Z, Li SZ. Single-shot refinement neural network for object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 4203–4212.

  24. Chen X, Yu J, Kong S, Wu Z, Wen L. Dual refinement networks for accurate and fast object detection in real-world scenes. arXiv preprint 2018. arXiv:180708638

  25. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL. Single-shot object detection with enriched semantics. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 5813–5821.

  26. Kong T, Sun F, Tan C, Liu H, Huang W. Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp 169–185.

  27. Wang T, Anwer RM, Cholakkal H, Khan FS, Pang Y, Shao L. Learning rich features at high-speed for single-shot object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. pp 1971–1980.

  28. Pang Y, Wang T, Anwer RM, Khan FS, Shao L. Efficient featurized image pyramid network for single shot detector. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 7336–7344.

  29. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 3156–3164.

  30. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 7132–7141.

  31. Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 7794–7803.

  32. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE 2009. pp. 248–255.

  33. Jang HD, Woo S, Benz P, Park J, Kweon IS. Propose-and-attend single shot detector. In: The IEEE Winter Conference on Applications of Computer Vision. 2020. pp. 815–824.

  34. Zhang H, Kang D, He H, Wang FY. Aplnet: Attention-enhanced progressive learning network. Neurocomputing. 2020;371:166–76.

    Article  Google Scholar 

  35. Li S, Yang L, Huang J, Hua XS, Zhang L. Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019. pp. 6609–6618.

  36. Xu X, Luo X, Ma L. Context-aware hierarchical feature attention network for multi-scale object detection. In: 2020 IEEE International Conference on Image Processing (ICIP), IEEE 2020. pp. 2011–2015.

  37. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H. Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 4126–4134.

  38. Kong T, Yao A, Chen Y, Sun F. Hypernet: Towards accurate region proposal generation and joint object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 845–853.

  39. Bell S, Lawrence Zitnick C, Bala K, Girshick R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 2874–2883.

  40. Dai J, Li Y, He K, Sun J. R-fcn: Object detection via region-based fully convolutional networks. In: Adv Neural Inf Proces Syst. 2016. pp. 379–387.

  41. Jeong J, Park H, Kwak N. Enhancement of ssd by concatenating feature maps for object detection. arXiv preprint 2017. arXiv:170509587

  42. Lee K, Choi J, Jeong J, Kwak N. Residual features and unified prediction network for single stage detection. arXiv preprint 2017. arXiv:170705031

  43. Xie S, Liu C, Gao J, Li X, Luo J, Fan B, Chen J, Pu H, Peng Y. Diverse receptive field network with context aggregation for fast object detection. J Vis Commun Image Represent. 2020. pp. 102770.

  44. Liu S, Huang D, et al. Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 385–400.

  45. Quan Q, He F, Li H. A multi-phase blending method with incremental intensity for training detection networks. Vis Comput. 2020. pp. 1–15.

  46. Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 7263–7271.

  47. Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 761–769.

  48. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 764–773.

  49. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 2980–2988.

  50. Wang Q, Chen M, Nie F, Li X. Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans Pattern Anal Mach Intell. 2018;42(1):46–58.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China Grand No:61371156 and the Key R&D Program of Anhui Province Grand No:201904d07020118. The authors would like to thank the anonymous reviews for their helpful and constructive comments and suggestions regarding this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu Zhan.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies that used human participants or animals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ouyang, P., Zhu, J., Fan, C. et al. Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector. Cogn Comput 14, 1107–1122 (2022). https://doi.org/10.1007/s12559-020-09814-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-020-09814-5

Keywords

Navigation