Skip to main content

Advertisement

Log in

Pose estimation at night in infrared images using a lightweight multi-stage attention network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Human Keypoints Detection is a relatively basic task in computer vision; it is the pre-task of human action recognition, behavior analysis and human–computer interaction. Since most abnormal actions occur at night, how to effectively extract skeleton sequence data in a low-light or completely dark environment poses a huge challenge for its identification. This paper proposes to use far infrared images to detection key points of the human body, which can solve the problem of human pose estimation under challenging weather conditions such as total darkness, smoke, inclement weather and glare. However, far-infrared images have some shortcomings, such as low resolution, noise and thermal characteristics; the skeleton data need to be provided in real time for the next stage of task. Based on the above reasons, this paper proposes a lightweight multi-stage attention network (LMANet) to detect the key points of human at night. This new network structure adds context information through the large receptive field, which helps to assist the detection of neighboring key points through this information, but for the sake of lightweight consideration, this article only extends the network to two stages. In addition, this article uses the attention module to effectively select channels with a large amount of information and highlight the features of key points, while eliminating background interference. In order to detect key points of the human in various complex environments, we use techniques such as difficult sample mining which improves the accuracy of key points with low confidence. Our network has been verified on two visible light datasets, fully demonstrating excellent performance. This paper successfully introduces far-infrared images into the field of pose estimation, because there is no public dataset for far-infrared pose estimation. In this paper, 700 images are selected for annotation from multiple public far-infrared object detection, segmentation and action recognition datasets; our algorithm is verified on this dataset; the effect is very good. After the paper is published, we will publish our key points of the human body annotated documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger, K (eds.) Advances in Neural Information Processing Systems pp, 1097–1105 (2012)

    Google Scholar 

  2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  3. Szegedy, C., Liu, W., Jia, Y.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  4. He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  5. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. arXiv:1506.02640 (2016)

  6. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.S.S.D.: Single Shot MultiBox Detector. arXiv: 1512.02325 (2016)

  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1411.4038 (2015)

  8. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2016)

    Article  Google Scholar 

  9. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1612.01105 (2017)

  10. Toshev, A., Sezedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)

  11. Pfister, T., Charlse, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)

  12. Wei, S.E., Ramakrishna, V., Kanade, T.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

  13. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499 (2016)

  14. Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911 (2017)

  15. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  16. Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D.: Quantized densely connected u-nets for efficient landmark localization. In: Proceedings of the European Conference on Computer Vision, pp. 339–354 (2018)

  17. Debnath, B., O’Brien, M., Yamaguchi, M., Behera, A.: Adapting mobilenets for mobile based upper body pose estimation. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 1–6 (2018)

  18. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Ecient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  19. Feng, Z., Xiatian, Z., Mao, Y.: Fast human pose estimation. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2019)

  20. Heo, D., Lee, E., Ko, B.C.: Pedestrian detection at night using deep neural networks and saliency maps. Electron. Imaging 17, 1–9 (2018)

    Article  Google Scholar 

  21. Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 7263–7271 (2017)

  22. Cao, Z., Yang, H., Zhao, J., Pan, X., Zhang, L., Liu, Z.: A new region proposal network for far-infrared pedestrian detection. IEEE Access 7, 135023–135030 (2019)

    Article  Google Scholar 

  23. Park, J., Chen, J., Cho, Y.K., Kang, D.Y., Son, B.J.: CNN-based person detection using infrared images for night-time intrusion warning systems. Sensors 20, 34 (2020)

    Article  Google Scholar 

  24. Chen, Y., Shin, H.: Pedestrian detection at night in infrared images using an attention-guided encoder-decoder convolutional neural network. Appl. Sci. 10, 809 (2020). https://doi.org/10.3390/app10030809

    Article  Google Scholar 

  25. Howard, A., Sandler, M., Chu, G., et al.: Searching for MobileNetV3. In: IEEE International Conference on Computer Vision (2019)

  26. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. European conference on computer vision (2018)

  27. Shrivastava, A., Gupta, A., Girshick, R.: Training regionbased object detectors with online hard example mining. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)

  28. Andriluka, M., PishchulinL, P., Gehler, Schiele, B.: 2D human pose estimation: New benchmark and state of the art analysis. In: CVPR (2014)

  29. https://challenger.ai/datasets/keypoint

  30. Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742 (2016)

  31. Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proceedings of the IEEE International Conference on Computer Vision, p. 7 (2017)

  32. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Cortes, C., Welling, M., Lawrence, N., Ghahramani, Z., Weinberger, K (eds.) Advances in Neural Information Processing Systems pp., 1799–1807 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Zang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zang, Y., Fan, C., Zheng, Z. et al. Pose estimation at night in infrared images using a lightweight multi-stage attention network. SIViP 15, 1757–1765 (2021). https://doi.org/10.1007/s11760-021-01916-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-01916-3

Keywords

Navigation