Enhancing feature fusion for human pose estimation,Machine Vision and Applications

当前位置： X-MOL 学术 › Mach. Vis. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing feature fusion for human pose estimation
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2020-09-24 , DOI: 10.1007/s00138-020-01104-2
Rui Wang , Jiangwei Tong , Xiangyang Wang

Current human pose estimation methods mainly rely on designing efficient Convolutional Neural Networks (CNN) frameworks. These CNN architectures typically consist of high-to-low resolution sub-networks to learn semantic information, and then followed by low-to-high sub-networks to raise the resolution to locate the keypoints. Because low-level features have high resolution but less semantic information, while high-level features have rich semantic information but less high resolution details, so it is important to fuse different level features to improve the final performance. However, most existing models implement feature fusion by simply concatenate low-level and high-level features without considering the gap between spatial resolution and semantic levels. In this paper, we propose a new feature fusion method for human pose estimation. We introduce high level semantic information into low-level features to enhance feature fusion. Further, to keep both the high-level semantic information and high-resolution location details, we use Global Convolutional Network blocks to bridge the gap between low-level and high-level features. Experiments on MPII and LSP human pose estimation datasets demonstrate that efficient feature fusion can significantly improve the performance. The code is available at: https://github.com/tongjiangwei/FeatureFusion.

中文翻译：

增强特征融合以进行人体姿势估计

当前的人体姿势估计方法主要依赖于设计有效的卷积神经网络（CNN）框架。这些CNN架构通常由高到低分辨率的子网组成，以学习语义信息，然后由低到高的子网组成，以提高分辨率以定位关键点。由于低级特征具有高分辨率但语义信息较少，而高级特征具有丰富的语义信息但高分辨率细节较少，因此融合不同级别的特征以提高最终性能很重要。但是，大多数现有模型通过简单地将低级和高级特征连接在一起而无需考虑空间分辨率和语义级别之间的差距来实现特征融合。在本文中，我们提出了一种用于人体姿势估计的新特征融合方法。我们将高级语义信息引入低级特征中，以增强特征融合。此外，为了保留高级语义信息和高分辨率位置详细信息，我们使用全局卷积网络块来弥合低级和高级功能之间的差距。在MPII和LSP人体姿态估计数据集上进行的实验表明，有效的特征融合可以显着提高性能。该代码可从以下网址获得：https：//github.com/tongjiangwei/FeatureFusion。在MPII和LSP人体姿态估计数据集上进行的实验表明，有效的特征融合可以显着提高性能。该代码可从以下网址获得：https：//github.com/tongjiangwei/FeatureFusion。在MPII和LSP人体姿态估计数据集上进行的实验表明，有效的特征融合可以显着提高性能。该代码可从以下网址获得：https：//github.com/tongjiangwei/FeatureFusion。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11