当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2021-02-26 , DOI: 10.1007/s11263-021-01436-0
Ruixu Liu , Ju Shen , He Wang , Chen Chen , Sen-ching Cheung , Vijayan K. Asari

The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other forms of constraints can be incorporated into the attention framework for learning long-range dependencies for the task of pose estimation. The contribution of this paper is to provide a systematic approach for designing and training of attention-based models for the end-to-end pose estimation, with the flexibility and scalability of arbitrary video sequences as input. We achieve this by adapting temporal receptive field via a multi-scale structure of dilated convolutions. Besides, the proposed architecture can be easily adapted to a causal model enabling real-time performance. Any off-the-shelf 2D pose estimation systems, e.g. Our method achieves the state-of-the-art performance and outperforms existing methods by reducing the mean per joint position error to 33.4mm on Human 3.6M dataset. Our code is available at https://github.com/lrxjason/Attention3DHumanPose



中文翻译:

通过使用具有扩展卷积的基于注意力的神经网络,从视频中增强的3D人姿估计

注意机制为学习具有增强的隐式时间一致性的空间模型提供了顺序预测框架。在这项工作中,我们展示了一种系统设计(从2D到3D),其中介绍了如何将常规网络和其他形式的约束条件并入注意框架中,以学习姿势估计任务的远程依赖性。本文的贡献在于提供一种系统的方法,用于设计和训练基于注意力的端到端姿势估计模型,并以任意视频序列的灵活性和可伸缩性作为输入。我们通过扩张卷积的多尺度结构适应时间感受野来实现这一点。此外,所提出的体系结构可以容易地适应因果模型,从而实现实时性能。任何现有的2D姿态估计系统,例如,我们的方法都可以通过将人类3.6M数据集上的每个关节位置误差的平均值减小到33.4mm来达到最新的性能并优于现有方法。我们的代码位于https://github.com/lrxjason/Attention3DHumanPose

更新日期:2021-02-26
down
wechat
bug