Enhancing feature fusion for human pose estimation

Wang, Rui; Tong, Jiangwei; Wang, Xiangyang

doi:10.1007/s00138-020-01104-2

Enhancing feature fusion for human pose estimation

Original Paper
Published: 24 September 2020

Volume 31, article number 70, (2020)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

504 Accesses
2 Citations
Explore all metrics

Abstract

Current human pose estimation methods mainly rely on designing efficient Convolutional Neural Networks (CNN) frameworks. These CNN architectures typically consist of high-to-low resolution sub-networks to learn semantic information, and then followed by low-to-high sub-networks to raise the resolution to locate the keypoints. Because low-level features have high resolution but less semantic information, while high-level features have rich semantic information but less high resolution details, so it is important to fuse different level features to improve the final performance. However, most existing models implement feature fusion by simply concatenate low-level and high-level features without considering the gap between spatial resolution and semantic levels. In this paper, we propose a new feature fusion method for human pose estimation. We introduce high level semantic information into low-level features to enhance feature fusion. Further, to keep both the high-level semantic information and high-resolution location details, we use Global Convolutional Network blocks to bridge the gap between low-level and high-level features. Experiments on MPII and LSP human pose estimation datasets demonstrate that efficient feature fusion can significantly improve the performance. The code is available at: https://github.com/tongjiangwei/FeatureFusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human pose estimation based on feature enhancement and multi-scale feature fusion

Article 18 June 2022

Dandan Cao, Weibin Liu, … Xiang Wei

MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation

Article 02 April 2022

Jia Xu, Weibin Liu, … Xiang Wei

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation

References

Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision 2016 (ECCV), pp. 483–499. Springer, Cham (2016)
Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1281–1290 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3686–3693 (2014)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1014–1021. IEEE (2009)
Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems, pp. 1736–1744 (2014)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732 (2016)
Chou, C.J., Chien, J.T., Chen, H.T.: Self adversarial training for human pose estimation. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 17–30. IEEE (2018)
Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 190–206 (2018)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703 (2019)
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7103–7112 (2018)
He, K., Gkioxari, G., Dollr, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields (2018). arXiv preprint arXiv:1812.08008
Hidalgo, G., Raaj, Y., Idrees, H., Xiang, D., Joo, H., Simon, T., Sheikh, Y.: Single-network whole-body pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 6982–6991 (2019)
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Bottom–up higher-resolution networks for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Shi, W., Caballero, J., Huszr, F., Totz, J., Aitken, A.P., Bishop, R., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883 (2016)
Zhang, Z., Zhang, X., Peng, C., Xue, X., Sun, J.: Exfuse: enhancing feature fusion for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–284 (2018)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2017)
Shah, S., Ghosh, P., Davis, L.S., Goldstein, T.: Stacked U-Nets: a no-frills approach to natural image segmentation (2018). arXiv preprint arXiv:1804.10343
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2013)
Article Google Scholar
Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 468–475 (2017)
Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. In: European Conference on Computer Vision (ECCV), pp. 246–260. Springer, Cham (2016)
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4929–4937 (2016)
Insafutdinov, E., Pishchulin, L., Andres, B., Andrilkula, M. Schiele, B.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision (ECCV), pp. 34–50. Springer, Cham (2016)
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: European Conference on Computer Vision (ECCV), pp. 717–732. Springer, Cham (2016)
Hu, P., Ramanan, D.: Bottom–up and top–down reasoning with hierarchical rectified Gaussians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5600–5609 (2016)
Gkioxari, G., Toshev, A., Jaitly, N.: Chained predictions using convolutional neural networks. In: European Conference on Computer Vision (ECCV), pp. 728–743. Springer, Cham (2016)
Rafi, U., Leibe, B., Gall, J., Kostrikov, I.: An efficient convolutional network for human pose estimation. In: British Machine Vision Conference (BMVC), pp. 1–11 (2016)
Tang, Z., Peng, X., Geng, S., Zhu, Y., Metaxas, D.N.: CU-Net: coupled U-Nets (2018). arXiv preprint arXiv:1808.06521
Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S.: Quantized densely connected U-Nets for efficient landmark localization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 339–354 (2018)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61771299.

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, China
Rui Wang, Jiangwei Tong & Xiangyang Wang

Authors

Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangwei Tong
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, R., Tong, J. & Wang, X. Enhancing feature fusion for human pose estimation. Machine Vision and Applications 31, 70 (2020). https://doi.org/10.1007/s00138-020-01104-2

Download citation

Received: 06 January 2020
Revised: 12 May 2020
Accepted: 17 July 2020
Published: 24 September 2020
DOI: https://doi.org/10.1007/s00138-020-01104-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing feature fusion for human pose estimation

Abstract

Access this article

Similar content being viewed by others

Human pose estimation based on feature enhancement and multi-scale feature fusion

MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancing feature fusion for human pose estimation

Abstract

Access this article

Similar content being viewed by others

Human pose estimation based on feature enhancement and multi-scale feature fusion

MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation

Adaptively Fusing Complete Multi-resolution Features for Human Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation