当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UniPose+: A Unified Framework for 2D and 3D Human Pose Estimation in Images and Videos.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2022-11-07 , DOI: 10.1109/tpami.2021.3124736
Bruno Artacho 1 , Andreas Savakis 1
Affiliation  

We propose UniPose+, a unified framework for 2D and 3D human pose estimation in images and videos. The UniPose+ architecture leverages multi-scale feature representations to increase the effectiveness of backbone feature extractors, with no significant increase in network size and no postprocessing. Current pose estimation methods heavily rely on statistical postprocessing or predefined anchor poses for joint localization. The UniPose+ framework incorporates contextual information across scales and joint localization with Gaussian heatmap modulation at the decoder output to estimate 2D and 3D human pose in a single stage with state-of-the-art accuracy, without relying on predefined anchor poses. The multi-scale representations allowed by the waterfall module in the UniPose+ framework leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on multiple datasets demonstrate that UniPose+, with a HRNet, ResNet or SENet backbone and waterfall module, is a robust and efficient architecture for single person 2D and 3D pose estimation in single images and videos.

中文翻译:

UniPose+:图像和视频中 2D 和 3D 人体姿势估计的统一框架。

我们提出了 UniPose+,这是一个用于图像和视频中 2D 和 3D 人体姿态估计的统一框架。UniPose+ 架构利用多尺度特征表示来提高骨干特征提取器的效率,而不会显着增加网络规模,也不会进行后处理。当前的姿态估计方法在很大程度上依赖于统计后处理或预定义的锚点姿态来进行联合定位。UniPose+ 框架将跨尺度的上下文信息和联合定位与解码器输出端的高斯热图调制相结合,以在一个阶段中以最先进的精度估计 2D 和 3D 人体姿势,而不依赖于预定义的锚姿势。UniPose+ 框架中的瀑布模块允许的多尺度表示利用级联架构中渐进式过滤的效率,同时保持与空间金字塔配置相当的多尺度视野。我们在多个数据集上的结果表明,带有 HRNet、ResNet 或 SENet 骨干网和瀑布模块的 UniPose+ 是一种强大而高效的架构,适用于单个图像和视频中的单人 2D 和 3D 姿势估计。
更新日期:2021-11-02
down
wechat
bug