Attention Refined Network for Human Pose Estimation,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Attention Refined Network for Human Pose Estimation
Neural Processing Letters ( IF 2.6 ) Pub Date : 2021-05-20 , DOI: 10.1007/s11063-021-10523-9
Xiangyang Wang , Jiangwei Tong , Rui Wang

Recently, multi-scale feature fusion has been considered as one of the most important issues in designing convolutional neural networks (CNNs). However, most existing methods directly add the corresponding layers together without considering the semantic gaps between them, which may lead to inadequately feature fusion results. In this paper, we propose an attention refined network (HR-ARNet) to enhance multi-scale feature fusion for human pose estimation. The HR-ARNet employs channel and spatial attention mechanisms to reinforce important features and suppress unnecessary ones. To tackle the problem of inconsistent among keypoints, we utilize self-attention strategy to model long-range keypoints dependencies. We also propose to use the focus loss, which modifies the commonly used square error loss function to let it mainly focus on top K ‘hard’ keypoints during training. Focus loss selects ‘hard’ keypoints based on the training loss and only backpropagates the gradients from the selected keypoints. Experiments on human pose estimation benchmark, MPII Human Pose Dataset and COCO Keypoint Dataset, show that our method can boost the performance of state-of-the-art human pose estimation networks including HRNet (high-resolution net) (Sun et al., Proceedings of the IEEE conference on computer vision and pattern recognition, 2019). The code and models are available at: http://github/tongjiangwei/ARNet.

中文翻译：

人体姿态估计的注意力细化网络

最近，多尺度特征融合已被认为是设计卷积神经网络（CNN）的最重要问题之一。然而，大多数现有方法将相应的层直接加在一起而不考虑它们之间的语义差距，这可能导致特征融合结果不足。在本文中，我们提出了一种注意力集中网络（HR-ARNet），以增强用于人体姿势估计的多尺度特征融合。HR-ARNet利用通道和空间注意机制来增强重要功能并抑制不必要的功能。为了解决关键点之间不一致的问题，我们利用自我注意策略对远程关键点依赖性进行建模。我们还建议使用焦点损失，它修改了常用的平方误差损失函数，使其主要集中在训练过程中的前K个“硬”关键点上。焦点损失会根据训练损失选择“硬”关键点，并且只会从所选关键点反向传播梯度。在人体姿态估计基准，MPII人体姿态数据集和COCO关键点数据集上进行的实验表明，我们的方法可以提高包括HRNet（高分辨率网络）在内的最新人体姿态估计网络的性能（Sun等， IEEE计算机视觉和模式识别会议论文集，2019年）。这些代码和模型可在以下位置获得：http：// github / tongjiangwei / ARNet。在人体姿态估计基准，MPII人体姿态数据集和COCO关键点数据集上进行的实验表明，我们的方法可以提高包括HRNet（高分辨率网络）在内的最新人体姿态估计网络的性能（Sun等， IEEE计算机视觉和模式识别会议论文集，2019年）。这些代码和模型可在以下位置获得：http：// github / tongjiangwei / ARNet。在人体姿态估计基准，MPII人体姿态数据集和COCO关键点数据集上进行的实验表明，我们的方法可以提高包括HRNet（高分辨率网络）在内的最新人体姿态估计网络的性能（Sun等， IEEE计算机视觉和模式识别会议论文集，2019年）。这些代码和模型可在以下位置获得：http：// github / tongjiangwei / ARNet。

更新日期：2021-05-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11