当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bi-Stream Pose-Guided Region Ensemble Network for Fingertip Localization From Stereo Images.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2020-02-11 , DOI: 10.1109/tnnls.2020.2964037
Guijin Wang , Cairong Zhang , Xinghao Chen , Xiangyang Ji , Jing-Hao Xue , Hang Wang

In human-computer interaction, it is important to accurately estimate the hand pose, especially fingertips. However, traditional approaches to fingertip localization mainly rely on depth images and thus suffer considerably from noise and missing values. Instead of depth images, stereo images can also provide 3-D information of hands. There are nevertheless limitations on the dataset size, global viewpoints, hand articulations, and hand shapes in publicly available stereo-based hand pose datasets. To mitigate these limitations and promote further research on hand pose estimation from stereo images, we build a new large-scale binocular hand pose dataset called THU-Bi-Hand, offering a new perspective for fingertip localization. In the THU-Bi-Hand dataset, there are 447k pairs of stereo images of different hand shapes from ten subjects with accurate 3-D location annotations of the wrist and five fingertips. Captured with minimal restriction on the range of hand motion, the dataset covers a large global viewpoint space and hand articulation space. To better present the performance of fingertip localization on THU-Bi-Hand, we propose a novel scheme termed bi-stream pose-guided region ensemble network (Bi-Pose-REN). It extracts more representative feature regions around joints in the feature maps under the guidance of the previously estimated pose. The feature regions are integrated hierarchically according to the topology of hand joints to regress a refined hand pose. Bi-Pose-REN and several existing methods are evaluated on THU-Bi-Hand so that benchmarks are provided for further research. Experimental results show that our Bi-Pose-REN has achieved the best performance on THU-Bi-Hand.

中文翻译:

用于从立体声图像进行指尖定位的双流姿势指导区域集成网络。

在人机交互中,准确估计手的姿势(尤其是指尖)非常重要。然而,传统的指尖定位方法主要依赖于深度图像,因此遭受噪声和缺失值的困扰。除了深度图像,立体图像还可以提供手的3-D信息。但是,在公开可用的基于立体声的手部姿势数据集中,数据集大小,全局视点,手部关节和手部形状存在限制。为了减轻这些限制并促进对立体图像中的手部姿势估计的进一步研究,我们建立了一个新的大规模双目手部姿势数据集,称为THU-Bi-Hand,为指尖定位提供了新的视角。在THU-Bi-Hand数据集中,共有来自十个对象的447k对不同手形的立体图像,并带有准确的手腕和五个指尖的3-D位置注释。在对手部动作范围的最小限制下捕获的数据集覆盖了较大的全局视点空间和手部关节运动空间。为了更好地展示指尖在THU-Bi-Hand上的定位性能,我们提出了一种称为双流姿势引导区域集成网络(Bi-Pose-REN)的新方案。它在先前估计的姿势的指导下,在特征图中的关节周围提取出更具代表性的特征区域。根据手关节的拓扑结构将特征区域分层集成,以回归精致的手部姿势。在THU-Bi-Hand上评估了Bi-Pose-REN和几种现有方法,从而为进一步研究提供了基准。
更新日期:2020-02-11
down
wechat
bug