Pose estimation at night in infrared images using a lightweight multi-stage attention network

Zang, Ying; Fan, Chunpeng; Zheng, Zeyu; Yang, Dongsheng

doi:10.1007/s11760-021-01916-3

Pose estimation at night in infrared images using a lightweight multi-stage attention network

Original Paper
Published: 03 May 2021

Volume 15, pages 1757–1765, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Ying Zang^1,2,3,
Chunpeng Fan⁵,
Zeyu Zheng^1,4,5 &
…
Dongsheng Yang^1,2

892 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Human Keypoints Detection is a relatively basic task in computer vision; it is the pre-task of human action recognition, behavior analysis and human–computer interaction. Since most abnormal actions occur at night, how to effectively extract skeleton sequence data in a low-light or completely dark environment poses a huge challenge for its identification. This paper proposes to use far infrared images to detection key points of the human body, which can solve the problem of human pose estimation under challenging weather conditions such as total darkness, smoke, inclement weather and glare. However, far-infrared images have some shortcomings, such as low resolution, noise and thermal characteristics; the skeleton data need to be provided in real time for the next stage of task. Based on the above reasons, this paper proposes a lightweight multi-stage attention network (LMANet) to detect the key points of human at night. This new network structure adds context information through the large receptive field, which helps to assist the detection of neighboring key points through this information, but for the sake of lightweight consideration, this article only extends the network to two stages. In addition, this article uses the attention module to effectively select channels with a large amount of information and highlight the features of key points, while eliminating background interference. In order to detect key points of the human in various complex environments, we use techniques such as difficult sample mining which improves the accuracy of key points with low confidence. Our network has been verified on two visible light datasets, fully demonstrating excellent performance. This paper successfully introduces far-infrared images into the field of pose estimation, because there is no public dataset for far-infrared pose estimation. In this paper, 700 images are selected for annotation from multiple public far-infrared object detection, segmentation and action recognition datasets; our algorithm is verified on this dataset; the effect is very good. After the paper is published, we will publish our key points of the human body annotated documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Human Keypoint Detection in Infrared Images

Skeleton-based 3D human pose estimation with low-resolution infrared array sensor using attention based CNN-BiGRU

Article 18 November 2023

Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection

Article 28 October 2023

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger, K (eds.) Advances in Neural Information Processing Systems pp, 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. arXiv:1506.02640 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.S.S.D.: Single Shot MultiBox Detector. arXiv: 1512.02325 (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1411.4038 (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2016)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1612.01105 (2017)
Toshev, A., Sezedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Pfister, T., Charlse, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)
Wei, S.E., Ramakrishna, V., Kanade, T.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499 (2016)
Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911 (2017)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D.: Quantized densely connected u-nets for efficient landmark localization. In: Proceedings of the European Conference on Computer Vision, pp. 339–354 (2018)
Debnath, B., O’Brien, M., Yamaguchi, M., Behera, A.: Adapting mobilenets for mobile based upper body pose estimation. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 1–6 (2018)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Ecient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Feng, Z., Xiatian, Z., Mao, Y.: Fast human pose estimation. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2019)
Heo, D., Lee, E., Ko, B.C.: Pedestrian detection at night using deep neural networks and saliency maps. Electron. Imaging 17, 1–9 (2018)
Article Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 7263–7271 (2017)
Cao, Z., Yang, H., Zhao, J., Pan, X., Zhang, L., Liu, Z.: A new region proposal network for far-infrared pedestrian detection. IEEE Access 7, 135023–135030 (2019)
Article Google Scholar
Park, J., Chen, J., Cho, Y.K., Kang, D.Y., Son, B.J.: CNN-based person detection using infrared images for night-time intrusion warning systems. Sensors 20, 34 (2020)
Article Google Scholar
Chen, Y., Shin, H.: Pedestrian detection at night in infrared images using an attention-guided encoder-decoder convolutional neural network. Appl. Sci. 10, 809 (2020). https://doi.org/10.3390/app10030809
Article Google Scholar
Howard, A., Sandler, M., Chu, G., et al.: Searching for MobileNetV3. In: IEEE International Conference on Computer Vision (2019)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. European conference on computer vision (2018)
Shrivastava, A., Gupta, A., Girshick, R.: Training regionbased object detectors with online hard example mining. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Andriluka, M., PishchulinL, P., Gehler, Schiele, B.: 2D human pose estimation: New benchmark and state of the art analysis. In: CVPR (2014)
https://challenger.ai/datasets/keypoint
Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742 (2016)
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proceedings of the IEEE International Conference on Computer Vision, p. 7 (2017)
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Cortes, C., Welling, M., Lawrence, N., Ghahramani, Z., Weinberger, K (eds.) Advances in Neural Information Processing Systems pp., 1799–1807 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, 100049, China
Ying Zang, Zeyu Zheng & Dongsheng Yang
Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China
Ying Zang & Dongsheng Yang
School of Information Engineering, Huzhou University, Huzhou, 313000, China
Ying Zang
Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110168, China
Zeyu Zheng
Hangzhou PingxingShijie Co., Ltd., Hangzhou, 311203, China
Chunpeng Fan & Zeyu Zheng

Authors

Ying Zang
View author publications
You can also search for this author in PubMed Google Scholar
Chunpeng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Zang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zang, Y., Fan, C., Zheng, Z. et al. Pose estimation at night in infrared images using a lightweight multi-stage attention network. SIViP 15, 1757–1765 (2021). https://doi.org/10.1007/s11760-021-01916-3

Download citation

Received: 21 November 2020
Revised: 12 February 2021
Accepted: 17 April 2021
Published: 03 May 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11760-021-01916-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pose estimation at night in infrared images using a lightweight multi-stage attention network

Abstract

Access this article

Similar content being viewed by others

Towards Human Keypoint Detection in Infrared Images

Skeleton-based 3D human pose estimation with low-resolution infrared array sensor using attention based CNN-BiGRU

Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pose estimation at night in infrared images using a lightweight multi-stage attention network

Abstract

Access this article

Similar content being viewed by others

Towards Human Keypoint Detection in Infrared Images

Skeleton-based 3D human pose estimation with low-resolution infrared array sensor using attention based CNN-BiGRU

Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation