View Invariant 3D Human Pose Estimation,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

View Invariant 3D Human Pose Estimation
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2020-12-01 , DOI: 10.1109/tcsvt.2019.2928813
Guoqiang Wei , Cuiling Lan , Wenjun Zeng , Zhibo Chen

The recent success of neural networks has significantly advanced the performance of 3D human pose estimation from 2D input images. However, the diversity of capturing viewpoints and the flexibility of the human poses remain some significant challenges. In this paper, we propose a view-invariant 3D human pose estimation module to alleviate the effects of viewpoint diversity. The proposed framework consists of a base network, which provides an initial estimation of a 3D pose, a view-invariant hierarchical correction network (VI-HC) on top of that to learn the 3D pose refinement under consistent views, and a view-invariant discriminative network (VID) to enforce high-level constraints over body configurations. In VI-HC, the initial 3D pose inputs are automatically transformed to consistent views for further refinements at the global body and local body parts level, respectively. For the VID, under consistent viewpoints, we use adversarial learning to differentiate between estimated 3D poses and real 3D poses to avoid implausible results. The experimental results demonstrate that the constraint on viewpoint consistency can dramatically enhance the performance of 3D human pose estimation. Our module shows robustness for different 3D pose base networks and achieves a significant improvement (about 9%) over a powerful baseline on the public 3D pose estimation benchmark Human3.6M.

中文翻译：

查看不变的 3D 人体姿势估计

神经网络最近的成功极大地提高了从 2D 输入图像进行 3D 人体姿态估计的性能。然而，捕捉观点的多样性和人类姿势的灵活性仍然是一些重大挑战。在本文中，我们提出了一种视图不变的 3D 人体姿态估计模块，以减轻视点多样性的影响。所提出的框架由一个基础网络组成，它提供了 3D 姿态的初始估计、一个视图不变分层校正网络 (VI-HC) 在其之上学习一致视图下的 3D 姿态细化，以及一个视图不变判别网络（VID）对身体配置实施高级约束。在 VI-HC 中，初始 3D 姿势输入会自动转换为一致的视图，以便分别在全局身体和局部身体部位级别进行进一步细化。对于 VID，在一致的观点下，我们使用对抗性学习来区分估计的 3D 姿势和真实的 3D 姿势，以避免出现令人难以置信的结果。实验结果表明，对视点一致性的约束可以显着提高 3D 人体姿态估计的性能。我们的模块显示了对不同 3D 姿势基础网络的鲁棒性，并在公共 3D 姿势估计基准 Human3.6M 的强大基线上实现了显着改进（约 9%）。实验结果表明，对视点一致性的约束可以显着提高 3D 人体姿态估计的性能。我们的模块显示了对不同 3D 姿势基础网络的鲁棒性，并在公共 3D 姿势估计基准 Human3.6M 的强大基线上实现了显着改进（约 9%）。实验结果表明，对视点一致性的约束可以显着提高 3D 人体姿态估计的性能。我们的模块显示了对不同 3D 姿势基础网络的鲁棒性，并在公共 3D 姿势估计基准 Human3.6M 的强大基线上实现了显着改进（约 9%）。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>