当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2018-04-16 , DOI: 10.1109/tpami.2018.2827052
Liuhao Ge , Hui Liang , Junsong Yuan , Daniel Thalmann

In this paper, we present a novel method for real-time 3D hand pose estimation from single depth images using 3D Convolutional Neural Networks (CNNs). Image-based features extracted by 2D CNNs are not directly suitable for 3D hand pose estimation due to the lack of 3D spatial information. Our proposed 3D CNN-based method, taking a 3D volumetric representation of the hand depth image as input and extracting 3D features from the volumetric input, can capture the 3D spatial structure of the hand and accurately regress full 3D hand pose in a single pass. In order to make the 3D CNN robust to variations in hand sizes and global orientations, we perform 3D data augmentation on the training data. To further improve the estimation accuracy, we propose applying the 3D deep network architectures and leveraging the complete hand surface as intermediate supervision for learning 3D hand pose from depth images. Extensive experiments on three challenging datasets demonstrate that our proposed approach outperforms baselines and state-of-the-art methods. A cross-dataset experiment also shows that our method has good generalization ability. Furthermore, our method is fast as our implementation runs at over 91 frames per second on a standard computer with a single GPU.

中文翻译:

具有3D卷积神经网络的实时3D手姿估计

在本文中,我们提出了一种使用3D卷积神经网络(CNN)从单深度图像进行实时3D手部姿势估计的新颖方法。由于缺少3D空间信息,由2D CNN提取的基于图像的特征不直接适用于3D手势估计。我们提出的基于3D CNN的方法将手部深度图像的3D体积表示作为输入,并从体积输入中提取3D特征,可以捕获手部的3D空间结构,并在一次通过中准确地回归完整的3D手部姿势。为了使3D CNN对手的大小和整体方向的变化具有鲁棒性,我们对训练数据进行3D数据增强。为了进一步提高估算精度,我们建议应用3D深度网络架构,并利用整个手表面作为中间监督,以从深度图像中学习3D手势。在三个具有挑战性的数据集上进行的大量实验表明,我们提出的方法优于基线和最新方法。跨数据集实验还表明,我们的方法具有良好的泛化能力。此外,由于我们的实现在具有单个GPU的标准计算机上以每秒91帧的速度运行,因此我们的方法速度很快。
更新日期:2019-03-06
down
wechat
bug