当前位置: X-MOL 学术IEEE/CAA J. Automatica Sinica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning a deep predictive coding network for a semi-supervised 3D-hand pose estimation
IEEE/CAA Journal of Automatica Sinica ( IF 15.3 ) Pub Date : 2020-03-27 , DOI: 10.1109/jas.2020.1003090
Jamal Banzi 1 , Isack Bulugu 2 , Zhongfu Ye 3
Affiliation  

In this paper we present a CNN based approach for a real time 3D-hand pose estimation from the depth sequence. Prior discriminative approaches have achieved remarkable success but are facing two main challenges: Firstly, the methods are fully supervised hence require large numbers of annotated training data to extract the dynamic information from a hand representation. Secondly, unreliable hand detectors based on strong assumptions or a weak detector which often fail in several situations like complex environment and multiple hands. In contrast to these methods, this paper presents an approach that can be considered as semi-supervised by performing predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision. The hand is modelled using a novel latent tree dependency model ( LDTM ) which transforms internal joint location to an explicit representation. Then the modeled hand topology is integrated with the pose estimator using data dependent method to jointly learn latent variables of the posterior pose appearance and the pose configuration respectively. Finally, an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose. Experiments on three challenging public datasets, ICVL, MSRA, and NYU demonstrate the significant performance of the proposed method which is comparable or better than state-of-the-art approaches.

中文翻译:

学习用于半监督3D手姿势估计的深度预测编码网络

在本文中,我们提出了一种基于CNN的深度序列实时3D手势估计方法。先前的判别方法已经取得了巨大的成功,但是面临两个主要挑战:首先,这些方法受到全面监督,因此需要大量带注释的训练数据来从手势中提取动态信息。其次,基于强假设或弱检测器的不可靠手持检测器通常会在复杂环境和多只手等几种情况下失效。与这些方法相比,本文提出了一种方法,该方法可通过对手势的图像序列执行预测编码来视为半监督,以便在无监督的情况下捕获给定图像下的潜在特征。使用新颖的潜在树相关性模型(LDTM)对手进行建模,该模型将内部关节位置转换为显式表示。然后将建模的手拓扑与姿态估计器集成在一起,使用数据相关方法,以共同学习后部姿态外观和姿态配置的潜在变量。最后,作为循环体系一部分的无监督误差项可确保对最终姿态的平滑估计。在三个具有挑战性的公共数据集ICVL,MSRA和NYU上进行的实验证明,该方法具有显着的性能,可与最新方法媲美或更好。然后将建模的手拓扑与姿态估计器集成在一起,使用数据相关方法,以共同学习后部姿态外观和姿态配置的潜在变量。最后,作为循环体系一部分的无监督误差项可确保对最终姿态的平滑估计。在三个具有挑战性的公共数据集ICVL,MSRA和NYU上进行的实验证明,该方法具有显着的性能,可与最新方法媲美或更好。然后将建模的手拓扑与姿态估计器集成在一起,使用数据相关方法,以共同学习后部姿态外观和姿态配置的潜在变量。最后,作为循环体系一部分的无监督误差项可确保对最终姿态的平滑估计。在三个具有挑战性的公共数据集ICVL,MSRA和NYU上进行的实验证明,该方法具有显着的性能,可与最新方法媲美或更好。
更新日期:2020-03-27
down
wechat
bug