当前位置: X-MOL 学术Signal Image Video Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cascade-guided multi-scale attention network for crowd counting
Signal, Image and Video Processing ( IF 2.0 ) Pub Date : 2021-04-15 , DOI: 10.1007/s11760-021-01903-8
Shufang Li , Zhengping Hu , Mengyao Zhao , Zhe Sun

The performance of crowd counting based on density estimation has been greatly improved with the development of deep learning. However, it is still a major issue to obtain high-quality density map due to the clutter of background, as well as the interference of perspective changes within and between scenes. In this paper, we propose a cascade-guided crowd counting network, which is mainly embedded with scale aware model (SAM) and attention aware model (AAM). First, SAM considers share-net design and multi-directional perspective transform in convolution to deal with multi-scale varying and smooth transition, while reducing the background noise in shallow features. Second, AAM further encodes the semantic inter dependencies by using the two-dimensional features of location and channel in order to let the network learn to pay attention to the key information. Finally, the global and local features are concatenated and taken into decoder to generate the estimated density map for crowd counting. Comprehensive experiments based on three established datasets show that the proposed method not only has higher accuracy, but also has stronger robustness to scale variation and background noise.



中文翻译:

级联引导的多尺度注意力网络,用于人群计数

随着深度学习的发展,基于密度估计的人群计数性能得到了极大的提高。但是,由于背景混乱以及场景内部和场景之间的视角变化的干扰,获取高质量的密度图仍然是一个主要问题。在本文中,我们提出了一种级联引导的人群计数网络,该网络主要嵌入了规模感知模型(SAM)和注意力感知模型(AAM)。首先,SAM将卷积网络设计和多方向透视变换考虑在内,以处理多尺度变化和平滑过渡,同时减少浅层特征中的背景噪声。第二,AAM还通过使用位置和通道的二维特征对语义相互依赖性进行编码,以使网络学会注意关键信息。最后,将全局和局部特征连接起来并放入解码器中,以生成估计的密度图以进行人群计数。基于三个已建立的数据集的综合实验表明,该方法不仅精度更高,而且对尺度变化和背景噪声具有更强的鲁棒性。

更新日期:2021-04-16
down
wechat
bug