Skip to main content
Log in

Deep Unsupervised 3D Human Body Reconstruction from a Sparse set of Landmarks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper we propose the first deep unsupervised approach in human body reconstruction to estimate body surface from a sparse set of landmarks, so called DeepMurf. We apply a denoising autoencoder to estimate missing landmarks. Then we apply an attention model to estimate body joints from landmarks. Finally, a cascading network is applied to regress parameters of a statistical generative model that reconstructs body. Our set of proposed loss functions allows us to train the network in an unsupervised way. Results on four public datasets show that our approach accurately reconstructs the human body from real world mocap data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and material

The authors have used public datasets in this work.

Notes

  1. We find 1 cm adequate for the experimented datasets, though can be adjusted for custom datasets.

  2. We discard \(R(\phi )\) for simplicity of reading.

  3. Note that DeepMurf\(\mathcal {L}_{{\textit{unpose}}}\) behaves similarly.

References

  • Achilles, F., Ichim, A.E., Coskun, H., Tombari, F., Noachtar, S., & Navab, N. (2016). Patient mocap: Human pose estimation under blanket occlusion for hospital monitoring applications. In International conference on medical image computing and computer-assisted intervention (pp. 491–499). Springer.

  • Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., & Pons-Moll, G. (2020). Combining implicit function learning and parametric models for 3D human reconstruction. In ECCV

  • Bhatnagar, B. L., Sminchisescu, C., Theobalt, C., & Pons-Moll, G. (2020). LoopReg: Self-supervised learning of implicit surface correspondences, pose and shape for 3D human mesh registration. In Neural information processing systems (NeurIPS).

  • Bogo, F., Black, M. J., Loper, M., & Romero, J. (2015). Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the IEEE international conference on computer vision (pp. 2300–2308).

  • Chai, J., & Hodgins, J. K. (2005). Performance animation from low-dimensional control signals. In ACM SIGGRAPH 2005 papers (pp. 686–696).

  • Chen, C. H., Tyagi, A., Agrawal, A., Drover, D., Stojanov, S., & Rehg, J. M. (2019). Unsupervised 3D pose estimation with geometric self-supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5714–5724).

  • Groueix, T., Fisher, M., Kim, V. G., Russell, B., & Aubry, M. (2018). 3D-coded: 3D correspondences by deep deformation. In ECCV.

  • Hoyet, L., Ryall, K., McDonnell, R., & O’Sullivan, C. (2012). Sleight of hand: Perception of finger motion from reduced marker sets. In Proceedings of the ACM SIGGRAPH symposium on interactive 3D graphics and games (pp. 79–86).

  • Huang, Y., Kaufmann, M., Aksan, E., Black, M. J., Hilliges, O., & Pons-Moll, G. (2018). Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), 37(6), 1–15.

    Article  Google Scholar 

  • Insafutdinov, E., & Dosovitskiy, A. (2018). Unsupervised learning of shape and pose with differentiable point clouds. In Advances in neural information processing systems (pp. 2802–2812).

  • Joo, H., Simon, T., & Sheikh, Y. (2018). Total capture: A 3D deformation model for tracking faces, hands, and bodies. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8320–8329).

  • Kanazawa, A., Black, M. J., Jacobs, D. W., & Malik, J. (2018). End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7122–7131).

  • Kudo, Y., Ogaki, K., Matsui, Y., & Odagiri, Y. (2018). Unsupervised adversarial learning of 3D human pose from 2D joint locations. Preprint retrieved from arXiv:1803.08244

  • Lab, C. G. (2000). CMU graphics lab motion capture. http://mocap.cs.cmu.edu

  • Liu, H., Wei, X., Chai, J., Ha, I., & Rhee, T. (2011). Realtime human motion control with a small number of inertial sensors. In Symposium on interactive 3D graphics and games (pp. 133–140).

  • Loper, M., Mahmood, N., & Black, M. J. (2014). Mosh: Motion and shape capture from sparse markers. ACM Transactions on Graphics (TOG), 33(6), 1–13.

    Article  Google Scholar 

  • Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2015). SMPL: A skinned multi-person linear model. ACM Transactions on Graphics (TOG), 34(6), 1–16.

    Article  Google Scholar 

  • Madadi, M., Bertiche, H., & Escalera, S. (2018). SMPLR: Deep SMPL reverse for 3D human pose and shape recovery. Preprint retrieved from arXiv:1812.10766.

  • Mahmood, N., Ghorbani, N., Troje, N. F., Pons-Moll, G., & Black, M. J. (2019). AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE international conference on computer vision (pp. 5442–5451).

  • Mehrizi, R., Peng, X., Tang, Z., Xu, X., Metaxas, D., & Li, K. (2018). Toward marker-free 3D pose estimation in lifting: A deep multi-view solution. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 485–491). IEEE.

  • Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H. P., Rhodin, H., Pons-Moll, G., & Theobalt, C. (2020). XNect: Real-time multi-person 3D motion capture with a single RGB camera. Vol. 39. https://doi.org/10.1145/3386569.3392410

  • Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.

    Article  Google Scholar 

  • Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., & Schiele, B. (2018). Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV) (pp. 484–494). IEEE.

  • Park, S. I., & Hodgins, J. K. (2006). Capturing and animating skin deformation in human motion. ACM Transactions on Graphics (TOG), 25(3), 881–889.

    Article  Google Scholar 

  • Pavlakos, G., Zhu, L., Zhou, X., & Daniilidis, K. (2018). Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 459–468).

  • Prokudin, S., Lassner, C., & Romero, J. (2019). Efficient learning on point clouds with basis point sets. In Proceedings of the IEEE international conference on computer vision (pp. 4332–4341).

  • Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H. P., & Theobalt, C. (2016). General automatic human shape and motion capture using volumetric contour cues. In European conference on computer vision (pp. 509–526). Springer.

  • Schwarz, L. A., Mateus, D., & Navab, N. (2009). Discriminative human full-body pose estimation from wearable inertial sensor data. In 3D physiological human workshop (pp. 159–172). Springer.

  • Slyper, R., & Hodgins, J. K. (2008). Action capture with accelerometers. In Symposium on computer animation (pp. 193–199).

  • Tautges, J., Zinke, A., Krüger, B., Baumann, J., Weber, A., Helten, T., et al. (2011). Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics (ToG), 30(3), 1–12.

    Article  Google Scholar 

  • Trumble, M., Gilbert, A., Malleson, C., Hilton, A., & Collomosse, J. (2017). Total capture: 3D human pose estimation fusing video and inertial sensors. In BMVC (Vol. 2, p. 3).

  • Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., & Schmid, C. (2018). Bodynet: Volumetric inference of 3D human body shapes. In Proceedings of the European conference on computer vision (ECCV) (pp. 20–36).

  • Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M. J., Laptev, I., & Schmid, C. (2017). Learning from synthetic humans. In CVPR

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.

    MathSciNet  MATH  Google Scholar 

  • von Marcard, T., Henschel, R., Black, M. J., Rosenhahn, B., & Pons-Moll, G. (2018). Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In Proceedings of the European conference on computer vision (ECCV) (pp. 601–617).

  • von Marcard, T., Rosenhahn, B., Black, M. J., & Pons-Moll, G. (2017). Sparse inertial poser: Automatic 3D human pose estimation from sparse IMUs. In Computer graphics forum (Vol. 36, pp. 349–360). Wiley Online Library.

  • Von Marcard, T., Pons-Moll, G., & Rosenhahn, B. (2016). Human pose estimation from video and IMUs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(8), 1533–1547.

    Article  Google Scholar 

  • Zhao, W., Chai, J., & Xu, Y.Q. (2012). Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data. In Proceedings of the ACM SIGGRAPH/eurographics symposium on computer animation (pp. 33–42). Eurographics Association

Download references

Funding

This work is partially supported by ICREA under the ICREA Academia programme, and by the Spanish project PID2019-105093GB-I00 (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya, and by Amazon Research Awards ARA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meysam Madadi.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Javier Romero.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 12540 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madadi, M., Bertiche, H. & Escalera, S. Deep Unsupervised 3D Human Body Reconstruction from a Sparse set of Landmarks. Int J Comput Vis 129, 2499–2512 (2021). https://doi.org/10.1007/s11263-021-01488-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01488-2

Keywords

Navigation