当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
U-shaped spatial–temporal transformer network for 3D human pose estimation
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2022-09-04 , DOI: 10.1007/s00138-022-01334-6
Honghong Yang , Longfei Guo , Yumei Zhang , Xiaojun Wu

3D human pose estimation has achieved much progress with the development of convolution neural networks. There still have some challenges to accurately estimate 3D joint locations from single-view images or videos due to depth ambiguity and severe occlusion. Motivated by the effectiveness of introducing vision transformer into computer vision tasks, we present a novel U-shaped spatial–temporal transformer-based network (U-STN) for 3D human pose estimation. The core idea of the proposed method is to process the human joints by designing a multi-scale and multi-level U-shaped transformer model. We construct a multi-scale architecture with three different scales based on the human skeletal topology, in which the local and global features are processed through three different scales with kinematic constraints. Furthermore, a multi-level feature representations is introduced by fusing intermediate features from different depths of the U-shaped network. With a skeletal constrained pooling and unpooling operations devised for U-STN, the network can transform features across different scales and extract meaningful semantic features at all levels. Experiments on two challenging benchmark datasets show that the proposed method achieves a good performance on 2D-to-3D pose estimation. The code is available at https://github.com/l-fay/Pose3D.



中文翻译:

用于 3D 人体姿态估计的 U 形时空变换网络

随着卷积神经网络的发展,3D 人体姿态估计取得了很大进展。由于深度模糊和严重遮挡,从单视图图像或视频中准确估计 3D 关节位置仍然存在一些挑战。受将视觉变换器引入计算机视觉任务的有效性的启发,我们提出了一种新颖的基于 U 形时空变换器的网络 (U-STN),用于 3D 人体姿态估计。该方法的核心思想是通过设计一个多尺度、多层次的U形变压器模型来处理人体关节。我们基于人体骨骼拓扑构造了一个具有三个不同尺度的多尺度架构,其中局部和全局特征通过具有运动学约束的三个不同尺度进行处理。此外,通过融合来自 U 形网络不同深度的中间特征来引入多级特征表示。通过为 U-STN 设计的骨架约束池化和反池化操作,网络可以跨不同尺度转换特征并在所有级别提取有意义的语义特征。在两个具有挑战性的基准数据集上进行的实验表明,所提出的方法在 2D 到 3D 姿态估计方面取得了良好的性能。该代码可在 https://github.com/l-fay/Pose3D 获得。在两个具有挑战性的基准数据集上进行的实验表明,所提出的方法在 2D 到 3D 姿态估计方面取得了良好的性能。该代码可在 https://github.com/l-fay/Pose3D 获得。在两个具有挑战性的基准数据集上进行的实验表明,所提出的方法在 2D 到 3D 姿态估计方面取得了良好的性能。该代码可在 https://github.com/l-fay/Pose3D 获得。

更新日期:2022-09-05
down
wechat
bug