Dual attention module and multi-label based fully convolutional network for crowd counting,IET Computer Vision

当前位置： X-MOL 学术 › IET Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dual attention module and multi-label based fully convolutional network for crowd counting
IET Computer Vision ( IF 1.5 ) Pub Date : 2020-11-16 , DOI: 10.1049/iet-cvi.2019.0674
Suyu Wang _{1,

2} , Bin Yang _{1,

2} , Bo Liu ₂ , Guanghui Zheng _{1,

2}

Affiliation

High-density crowd counting in natural scenes is an extremely difficult and challenging research subject in computer vision. Although the algorithm based on the convolutional neural network has achieved significantly better results than the traditional algorithm, most of them tend to focus on the local features of images, and difficult to obtain the rich global contextual dependencies. To solve this problem, a dual attention module and a multi-label based fully convolutional network are proposed in this study. Moreover, the authors improve the algorithm by the following multiple perspectives. Firstly, introducing the dual attention module, the global-context and long-range dependency are adaptively integrated into both spatial and channel dimensions, which improve the network expression ability. Then, the prediction error is effectively reduced by designing a multi-label mechanism, so the crowd-counting task is transformed into foreground and background segmentation tasks to assist in the regression task of the density map. Furthermore, on the basis of the traditional Euclidean distance loss and cross-entropy loss, the structural similarity index is introduced to further improve the training effect of the model. The test results of the UCF_CC_50, ShanghaiTech, and UCF-QNRF datasets indicate that the proposed method is superior to the current mainstream algorithm.

中文翻译：

基于双关注模块和基于多标签的全卷积网络进行人群计数

在自然场景中进行高密度人群计数是计算机视觉中极为困难和具有挑战性的研究课题。尽管基于卷积神经网络的算法取得了比传统算法明显更好的结果，但是它们大多数倾向于集中在图像的局部特征上，并且难以获得丰富的全局上下文相关性。为了解决这个问题，本研究提出了一种双重注意模块和一个基于多标签的全卷积网络。此外，作者从以下多个角度对算法进行了改进。首先，引入双重注意模块，将全局上下文和远程依赖性自适应地集成到空间和通道维度中，从而提高了网络表达能力。然后，通过设计多标签机制有效地减少了预测误差，因此将人群计数任务转换为前景和背景分割任务，以帮助完成密度图的回归任务。此外，在传统的欧几里得距离损失和交叉熵损失的基础上，引入结构相似性指标以进一步提高模型的训练效果。UCF_CC_50，ShanghaiTech和UCF-QNRF数据集的测试结果表明，该方法优于当前的主流算法。引入结构相似性指标以进一步提高模型的训练效果。UCF_CC_50，ShanghaiTech和UCF-QNRF数据集的测试结果表明，该方法优于当前的主流算法。引入结构相似性指标以进一步提高模型的训练效果。UCF_CC_50，ShanghaiTech和UCF-QNRF数据集的测试结果表明，该方法优于当前的主流算法。

更新日期：2020-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11