3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Huang, Ke; Sui, TianQi; Wu, Hong

doi:10.1007/s00530-021-00808-3

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Special Issue Paper
Published: 28 May 2021

Volume 28, pages 403–412, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

713 Accesses
7 Citations
Explore all metrics

Abstract

Since human pose can be naturally represented by a graph, graph convolutional networks (GCNs) have recently been proposed for 3D human pose estimation and achieved promising results. But most GCN-based methods use vanilla graph convolution which aggregates features of 1-hop neighbors and long-range dependencies between joints can only be captured by stacking multiple layers of graph convolution. To alleviate this problem, we propose a multi-scale graph convolution to aggregate features of neighbors at different distances and apply it to nodes with specified neighbor types. We further propose a hierarchical-body-pooling to aggregate and share body-level and body-part-level context information. Based on these components, we finally develop a light-weighted GCN for 3D pose lifting by repeatedly stacking a residual block of multi-scale graph convolution and a hierarchical-body-pooling layer. The experimental results on Human3.6M dataset indicate that our network can achieve state-of-the-art performance with much less model complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

Taha Samavati & Mohsen Soryani

Stacked Hourglass Networks for Human Pose Estimation

References

Abu-El-Haija, S., Perozzi, B., Kapoor, A., Harutyunyan, H., Alipourfard, N., Lerman, K., Steeg, G.V., Galstyan, A.: Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. ICML 97, 21–29 (2019)
Google Scholar
Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006). https://doi.org/10.1109/TPAMI.2006.21
Article Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. CVPR (2014). https://doi.org/10.1109/CVPR.2014.471
Article Google Scholar
Bruna, J., Zaremba, W., Szlam, A.D., LeCun, Y.: Spectral networks and locally connected networks on graphs. In: ICLR (2014)
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.J., Yuan, J., Magnenat-Thalmann, N.: Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00236
Article Google Scholar
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00742
Article Google Scholar
Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3d human pose estimation. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00235
Article Google Scholar
Drover, D., Chen, C.H., Agrawal, A., Tyagi, A., Phuoc Huynh, C.: Can 3d pose be learned from 2d projections alone? ECCV 11132, 78–94 (2018). https://doi.org/10.1007/978-3-030-11018-5_7
Article Google Scholar
Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: NIPS, pp. 2224–2232. (2015)
Fang, H., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning knowledge-guided pose grammar machine for 3d human pose estimation. (2017)
Grinciunaite, A., Gudi, A., Tasli, E., Den Uyl, M.: Human pose estimation in space and time using 3d cnn. ECCV Worksh. 9915, 32–39 (2016). https://doi.org/10.1007/978-3-319-49409-8_5
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. (2015)
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3d human pose estimation. ECCV 11214, 68–84 (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. ICCV (2011). https://doi.org/10.1109/ICCV.2011.6126500
Article Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human 3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013). https://doi.org/10.1109/TPAMI.2013.248
Article Google Scholar
Kazi, A., Shekarforoush, S., Krishna, S.A., Burwinkel, H., Vivar, G., Kortüm, K., Ahmadi, S.A., Albarqouni, S., Navab, N.: Inceptiongcn: Receptive field aware graph convolutional network for disease prediction. IPMI 11492, 73–85 (2019). https://doi.org/10.1007/978-3-030-20351-1_6
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3d human pose using multi-view geometry. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00117
Article Google Scholar
Li, Q., Han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: AAAI, pp. 3538–3545 (2018)
Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. ACCV 9004, 332–347 (2014)
Google Scholar
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: ICLR (2016)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. ECCV 8693, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Article Google Scholar
Liu, K., Ding, R., Zou, Z., Wang, L., Tang, W.: A comprehensive study of weight sharing in graph networks for 3d human pose estimation. ECCV 12355, 318–334 (2020). https://doi.org/10.1007/978-3-030-58607-2_19
Article Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. ICCV (2017). https://doi.org/10.1109/ICCV.2017.288
Article Google Scholar
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved cnn supervision. 3D Vis. (2017). https://doi.org/10.1109/3DV.2017.00064
Article Google Scholar
Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.P., Xu, W., Casas, D., Theobalt, C.: Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans. Graph. 36(4), 4411–4414 (2017). https://doi.org/10.1145/3072959.3073596
Article Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. ECCV 9912, 483–499 (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Article Google Scholar
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. ICML 48, 2014–2023 (2016)
Google Scholar
Onishi, K., Takiguchi, T., Ariki, Y.: 3d human posture estimation using the hog features from monocular image. ICPR (2008). https://doi.org/10.1109/ICPR.2008.4761608
Article MATH Google Scholar
Park, S., Hwang, J., Kwak, N.: 3d human pose estimation using convolutional neural networks with 2d pose information. ECCV Worksh. 9915, 156–169 (2016). https://doi.org/10.1007/978-3-319-49409-8_15
Article Google Scholar
Park, S., Kwak, N.: 3d human pose estimation with relational networks. In: BMVC, p. 129 (2018)
Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3d human pose estimation. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00763
Article Google Scholar
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. CVPR (2017). https://doi.org/10.1109/CVPR.2017.139
Article Google Scholar
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00794
Article MATH Google Scholar
Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A.: Monocular 3d human pose estimation by generation and ordinal ranking. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00241
Article Google Scholar
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. ICCV (2017). https://doi.org/10.1109/ICCV.2017.284
Article Google Scholar
Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: ICCV, pp. 3941–3950 (2017)
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. CVPR (2017). https://doi.org/10.1109/CVPR.2017.603
Article Google Scholar
Wang, M., Chen, X., Liu, W., Qian, C., Lin, L., Ma, L.: Drpose3d: Depth ranking in 3d human pose estimation. IJCAI (2018). https://doi.org/10.24963/ijcai.2018/136
Article Google Scholar
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3d human pose estimation in the wild by adversarial learning. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00551
Article Google Scholar
Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., Lin, S.: Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. ECCV 12359, 507–523 (2020). https://doi.org/10.1007/978-3-030-58568-6_30
Article Google Scholar
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00354
Article Google Scholar
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: A weakly-supervised approach. ICCV (2017). https://doi.org/10.1109/ICCV.2017.51
Article Google Scholar
Zhu, Q., Du, B., Yan, P.: Multi-hop convolutions on weighted graphs. (2019)
Zou, Z., Liu, K., Wang, L., Tang, W.: High-order graph convolutional networks for 3d human pose estimation. In: BMVC (2020)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (U20B2063), the Sichuan Science and Technology Program (2020YFS0057), and the Fundamental Research Funds for the Central Universities (ZYGX2019Z015).

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China
Ke Huang, TianQi Sui & Hong Wu

Authors

Ke Huang
View author publications
You can also search for this author in PubMed Google Scholar
TianQi Sui
View author publications
You can also search for this author in PubMed Google Scholar
Hong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, K., Sui, T. & Wu, H. 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling. Multimedia Systems 28, 403–412 (2022). https://doi.org/10.1007/s00530-021-00808-3

Download citation

Received: 15 October 2020
Accepted: 04 May 2021
Published: 28 May 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00530-021-00808-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Deep learning-based 3D reconstruction: a survey

Stacked Hourglass Networks for Human Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Deep learning-based 3D reconstruction: a survey

Stacked Hourglass Networks for Human Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation