当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2020-05-11 , DOI: 10.1109/tpami.2020.2993627
Yujun Cai , Liuhao Ge , Jianfei Cai , Nadia Magnenat Thalmann , Junsong Yuan

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to the substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from the existing learning-based monocular RGB-input approaches that require accurate 3D annotations for training, we propose to leverage the depth images that can be easily obtained from commodity RGB-D cameras during training, while during testing we take only RGB inputs for 3D joint predictions. In this way, we alleviate the burden of the costly 3D annotations in real-world dataset. Particularly, we propose a weakly-supervised method, adaptating from fully-annotated synthetic dataset to weakly-labeled real-world single RGB dataset with the aid of a depth regularizer, which serves as weak supervision for 3D pose prediction. To further exploit the physical structure of 3D hand pose, we present a novel CVAE-based statistical framework to embed the pose-specific subspace from RGB images, which can then be used to infer the 3D hand joint locations. Extensive experiments on benchmark datasets validate that our proposed approach outperforms baselines and state-of-the-art methods, which proves the effectiveness of the proposed depth regularizer and the CVAE-based framework.

中文翻译:

使用合成数据和弱标记 RGB 图像的 3D 手部姿势估计

与基于深度的 3D 手部姿态估计相比,由于深度模糊和难以获得完全注释的训练数据,从单目 RGB 图像推断 3D 手部姿态更具挑战性。与现有的基于学习的单目 RGB 输入方法需要准确的 3D 注释进行训练不同,我们建议利用在训练期间可以从商品 RGB-D 相机轻松获得的深度图像,而在测试期间我们仅采用 RGB 输入进行训练3D 联合预测。通过这种方式,我们减轻了现实世界数据集中昂贵的 3D 注释的负担。特别是,我们提出了一种弱监督方法,在深度正则化器的帮助下,从完全注释的合成数据集适应弱标记的真实世界单个 RGB 数据集,它作为 3D 姿态预测的弱监督。为了进一步利用 3D 手部姿势的物理结构,我们提出了一种新的基于 CVAE 的统计框架来嵌入来自 RGB 图像的特定于姿势的子空间,然后可用于推断 3D 手部关节位置。在基准数据集上进行的大量实验验证了我们提出的方法优于基线和最先进的方法,这证明了所提出的深度正则化器和基于 CVAE 的框架的有效性。
更新日期:2020-05-11
down
wechat
bug