Scale-aware attention-based multi-resolution representation for multi-person pose estimation,Multimedia Systems

当前位置： X-MOL 学术 › Multimedia Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scale-aware attention-based multi-resolution representation for multi-person pose estimation
Multimedia Systems ( IF 3.9 ) Pub Date : 2021-05-01 , DOI: 10.1007/s00530-021-00795-5
Honghong Yang , Longfei Guo , Xiaojun Wu , Yumei Zhang

The performance of multi-person pose estimation has significantly improved with the development of deep convolutional neural networks. However, two challenging issues are still ignored but are key factors causing deterioration in the keypoint localization. These two issues are scale variation of human body parts and huge information loss caused by consecutive striding in multiple upsampling. In this paper, we present a novel network named ‘Scale-aware attention-based multi-resolution representation network’ (SaMr-Net) which targets to make the proposed method against scale variation and prevent the detail information loss in upsampling, leading more precisely keypoint estimation. The proposed architecture adopts the high-resolution network (HRNet) as the backbone, we first introduce dilated convolution into the backbone to expand the receptive field. Then, attention-based multi-scale feature fusion module is devised to modify the exchange units in the HRNet, allowing the network to learn the weights of each fusion component. Finally, we design a scale-aware keypoint regressor model that gradually integrates features from low to high resolution, enhancing the invariance in different scales of pose parts keypoint estimation. We demonstrate the superiority of the proposed algorithm over two benchmark datasets: (1) the MS COCO keypoint benchmark, and (2) the MPII human pose dataset. The comparison shows that our approach achieves superior results.

中文翻译：

用于多人姿势估计的基于尺度感知的基于注意力的多分辨率表示

随着深度卷积神经网络的发展，多人姿势估计的性能已得到显着改善。但是，两个具有挑战性的问题仍然被忽略，但它们是导致关键点本地化恶化的关键因素。这两个问题是人体各部位的尺度变化以及由于连续多次采样而导致的巨大信息丢失。在本文中，我们提出了一个名为“基于规模感知的基于注意力的多分辨率表示网络”（SaMr-Net）的新型网络，其目标是使所提出的方法免受规模变化的影响并防止细节信息在上采样中丢失，从而更精确地引导关键点估计。所提出的架构采用高分辨率网络（HRNet）作为骨干，我们首先将膨胀卷积引入到主干中以扩展感受野。然后，设计了基于注意力的多尺度特征融合模块，以修改HRNet中的交换单元，从而使网络能够了解每个融合组件的权重。最后，我们设计了一个可感知尺度的关键点回归模型，该模型逐渐将特征从低分辨率集成到高分辨率，从而增强了不同比例的姿势部分关键点估计的不变性。我们证明了该算法相对于两个基准数据集的优越性：（1）MS COCO关键点基准，以及（2）MPII人类姿态数据集。比较表明，我们的方法取得了优异的结果。允许网络学习每个融合组件的权重。最后，我们设计了一个可感知尺度的关键点回归模型，该模型逐渐将特征从低分辨率集成到高分辨率，从而增强了不同比例的姿势部分关键点估计的不变性。我们证明了该算法相对于两个基准数据集的优越性：（1）MS COCO关键点基准，以及（2）MPII人类姿态数据集。比较表明，我们的方法取得了优异的结果。允许网络学习每个融合组件的权重。最后，我们设计了一个可感知尺度的关键点回归模型，该模型逐渐将特征从低分辨率集成到高分辨率，从而增强了不同比例的姿势部分关键点估计的不变性。我们证明了该算法相对于两个基准数据集的优越性：（1）MS COCO关键点基准，以及（2）MPII人类姿态数据集。比较表明，我们的方法取得了优异的结果。（2）MPII人体姿势数据集。比较表明，我们的方法取得了优异的结果。（2）MPII人体姿势数据集。比较表明，我们的方法取得了优异的结果。

更新日期：2021-05-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>