Skip to main content
Log in

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Since human pose can be naturally represented by a graph, graph convolutional networks (GCNs) have recently been proposed for 3D human pose estimation and achieved promising results. But most GCN-based methods use vanilla graph convolution which aggregates features of 1-hop neighbors and long-range dependencies between joints can only be captured by stacking multiple layers of graph convolution. To alleviate this problem, we propose a multi-scale graph convolution to aggregate features of neighbors at different distances and apply it to nodes with specified neighbor types. We further propose a hierarchical-body-pooling to aggregate and share body-level and body-part-level context information. Based on these components, we finally develop a light-weighted GCN for 3D pose lifting by repeatedly stacking a residual block of multi-scale graph convolution and a hierarchical-body-pooling layer. The experimental results on Human3.6M dataset indicate that our network can achieve state-of-the-art performance with much less model complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abu-El-Haija, S., Perozzi, B., Kapoor, A., Harutyunyan, H., Alipourfard, N., Lerman, K., Steeg, G.V., Galstyan, A.: Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. ICML 97, 21–29 (2019)

    Google Scholar 

  2. Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006). https://doi.org/10.1109/TPAMI.2006.21

    Article  Google Scholar 

  3. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. CVPR (2014). https://doi.org/10.1109/CVPR.2014.471

    Article  Google Scholar 

  4. Bruna, J., Zaremba, W., Szlam, A.D., LeCun, Y.: Spectral networks and locally connected networks on graphs. In: ICLR (2014)

  5. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.J., Yuan, J., Magnenat-Thalmann, N.: Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00236

    Article  Google Scholar 

  6. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00742

    Article  Google Scholar 

  7. Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3d human pose estimation. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00235

    Article  Google Scholar 

  8. Drover, D., Chen, C.H., Agrawal, A., Tyagi, A., Phuoc Huynh, C.: Can 3d pose be learned from 2d projections alone? ECCV 11132, 78–94 (2018). https://doi.org/10.1007/978-3-030-11018-5_7

    Article  Google Scholar 

  9. Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: NIPS, pp. 2224–2232. (2015)

  10. Fang, H., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning knowledge-guided pose grammar machine for 3d human pose estimation. (2017)

  11. Grinciunaite, A., Gudi, A., Tasli, E., Den Uyl, M.: Human pose estimation in space and time using 3d cnn. ECCV Worksh. 9915, 32–39 (2016). https://doi.org/10.1007/978-3-319-49409-8_5

  12. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)

  13. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. (2015)

  14. Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3d human pose estimation. ECCV 11214, 68–84 (2018). https://doi.org/10.1007/978-3-030-01249-6_5

  15. Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. ICCV (2011). https://doi.org/10.1109/ICCV.2011.6126500

    Article  Google Scholar 

  16. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human 3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013). https://doi.org/10.1109/TPAMI.2013.248

    Article  Google Scholar 

  17. Kazi, A., Shekarforoush, S., Krishna, S.A., Burwinkel, H., Vivar, G., Kortüm, K., Ahmadi, S.A., Albarqouni, S., Navab, N.: Inceptiongcn: Receptive field aware graph convolutional network for disease prediction. IPMI 11492, 73–85 (2019). https://doi.org/10.1007/978-3-030-20351-1_6

    Article  Google Scholar 

  18. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)

  19. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)

  20. Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3d human pose using multi-view geometry. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00117

    Article  Google Scholar 

  21. Li, Q., Han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: AAAI, pp. 3538–3545 (2018)

  22. Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. ACCV 9004, 332–347 (2014)

    Google Scholar 

  23. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: ICLR (2016)

  24. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. ECCV 8693, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Article  Google Scholar 

  25. Liu, K., Ding, R., Zou, Z., Wang, L., Tang, W.: A comprehensive study of weight sharing in graph networks for 3d human pose estimation. ECCV 12355, 318–334 (2020). https://doi.org/10.1007/978-3-030-58607-2_19

    Article  Google Scholar 

  26. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. ICCV (2017). https://doi.org/10.1109/ICCV.2017.288

    Article  Google Scholar 

  27. Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved cnn supervision. 3D Vis. (2017). https://doi.org/10.1109/3DV.2017.00064

    Article  Google Scholar 

  28. Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.P., Xu, W., Casas, D., Theobalt, C.: Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans. Graph. 36(4), 4411–4414 (2017). https://doi.org/10.1145/3072959.3073596

    Article  Google Scholar 

  29. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. ECCV 9912, 483–499 (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Article  Google Scholar 

  30. Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. ICML 48, 2014–2023 (2016)

    Google Scholar 

  31. Onishi, K., Takiguchi, T., Ariki, Y.: 3d human posture estimation using the hog features from monocular image. ICPR (2008). https://doi.org/10.1109/ICPR.2008.4761608

    Article  MATH  Google Scholar 

  32. Park, S., Hwang, J., Kwak, N.: 3d human pose estimation using convolutional neural networks with 2d pose information. ECCV Worksh. 9915, 156–169 (2016). https://doi.org/10.1007/978-3-319-49409-8_15

    Article  Google Scholar 

  33. Park, S., Kwak, N.: 3d human pose estimation with relational networks. In: BMVC, p. 129 (2018)

  34. Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3d human pose estimation. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00763

    Article  Google Scholar 

  35. Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. CVPR (2017). https://doi.org/10.1109/CVPR.2017.139

    Article  Google Scholar 

  36. Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00794

    Article  MATH  Google Scholar 

  37. Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A.: Monocular 3d human pose estimation by generation and ordinal ranking. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00241

    Article  Google Scholar 

  38. Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. ICCV (2017). https://doi.org/10.1109/ICCV.2017.284

    Article  Google Scholar 

  39. Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: ICCV, pp. 3941–3950 (2017)

  40. Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. CVPR (2017). https://doi.org/10.1109/CVPR.2017.603

    Article  Google Scholar 

  41. Wang, M., Chen, X., Liu, W., Qian, C., Lin, L., Ma, L.: Drpose3d: Depth ranking in 3d human pose estimation. IJCAI (2018). https://doi.org/10.24963/ijcai.2018/136

    Article  Google Scholar 

  42. Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3d human pose estimation in the wild by adversarial learning. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00551

    Article  Google Scholar 

  43. Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., Lin, S.: Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. ECCV 12359, 507–523 (2020). https://doi.org/10.1007/978-3-030-58568-6_30

    Article  Google Scholar 

  44. Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00354

    Article  Google Scholar 

  45. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: A weakly-supervised approach. ICCV (2017). https://doi.org/10.1109/ICCV.2017.51

    Article  Google Scholar 

  46. Zhu, Q., Du, B., Yan, P.: Multi-hop convolutions on weighted graphs. (2019)

  47. Zou, Z., Liu, K., Wang, L., Tang, W.: High-order graph convolutional networks for 3d human pose estimation. In: BMVC (2020)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (U20B2063), the Sichuan Science and Technology Program (2020YFS0057), and the Fundamental Research Funds for the Central Universities (ZYGX2019Z015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, K., Sui, T. & Wu, H. 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling. Multimedia Systems 28, 403–412 (2022). https://doi.org/10.1007/s00530-021-00808-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00808-3

Keywords

Navigation