Saliency detection in human crowd images of different density levels using attention mechanism,Signal Processing: Image Communication

当前位置： X-MOL 学术 › Signal Process. Image Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Saliency detection in human crowd images of different density levels using attention mechanism
Signal Processing: Image Communication ( IF 3.4 ) Pub Date : 2020-08-15 , DOI: 10.1016/j.image.2020.115976
Minh Tri Nguyen , Prarinya Siritanawan , Kazunori Kotani

The human visual system has the ability to rapidly identify and redirect attention to important visual information in high complexity scenes such as the human crowd. Saliency prediction in the human crowd scene is the process using computer vision techniques to imitate the human visual system, predicting which areas in a human crowd scene may attract human attention. However, it is a challenging task to identify which factors may attract human attention due to the high complexity of the human crowd scene. In this work, we propose Multiscale DenseNet – Dilated and Attention (MSDense-DAt), a convolutional neural network (CNN) using self-attention to integrate the result of knowledge-driven gaze in the human visual system to identify salient areas in the human crowd scene. Our method combines various state-of-the-art deep learning architectures to deal with the high complexity in human crowd image, such as multiscale DenseNet for multiscale deep features extraction, self-attention, and dilated convolution. Then the effectiveness of each component in our CNN architecture is evaluated by comparing different components combinations. Finally, the proposed method is further evaluated in different crowd density levels to appraise the effect of crowd density on model performance.

中文翻译：

使用注意力机制检测不同密度水平的人群图像中的显着性

人类视觉系统具有在诸如人群等高复杂度场景中快速识别并将注意力转移到重要视觉信息上的能力。人群场景中的显着性预测是使用计算机视觉技术来模仿人类视觉系统，预测人群场景中哪些区域可能引起人们注意的过程。然而，由于人群场景的高度复杂性，确定哪些因素可能引起人们的注意是一项艰巨的任务。在这项工作中，我们提出了多尺度DenseNet-扩张与注意力（MSDense-DAt），一种使用自我注意力的卷积神经网络（CNN），以整合人类视觉系统中知识驱动的凝视结果，以识别人类的显着区域人群现场。我们的方法结合了各种最新的深度学习架构，以应对人类人群图像中的高复杂性，例如用于多尺度深度特征提取，自我关注和扩张卷积的多尺度DenseNet。然后，通过比较不同的组件组合来评估CNN体系结构中每个组件的有效性。最后，在不同人群密度水平上进一步评估了该方法，以评估人群密度对模型性能的影响。

更新日期：2020-08-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文