Visual saliency prediction using multi-scale attention gated network,Multimedia Systems

当前位置： X-MOL 学术 › Multimedia Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual saliency prediction using multi-scale attention gated network
Multimedia Systems ( IF 3.9 ) Pub Date : 2021-05-19 , DOI: 10.1007/s00530-021-00796-4
Yubao Sun , Mengyang Zhao , Kai Hu , Shaojing Fan

Predicting human visual attention cannot only increase our understanding of the underlying biological mechanisms, but also bring new insights for other computer vision-related tasks such as autonomous driving and human–computer interaction. Current deep learning-based methods often place emphasis on high-level semantic feature for prediction. However, high-level semantic feature lacks fine-scale spatial information. Ideally, a saliency prediction model should include both spatial and semantic features. In this paper, we propose a multi-scale attention gated network (we refer to as MSAGNet) to fuse semantic features with different spatial resolutions for visual saliency prediction. Specifically, we adopt the high-resolution net (HRNet) as the backbone to extract the multi-scales semantic features. A multi-scale attention gating module is designed to adaptively fuse these multi-scale features in a hierarchical way. Different from the conventional way of feature concatenation from multiple layers or multi-scale inputs, this module calculates a spatial attention map from high-level semantic feature and then fuses it with the low-level spatial feature through gating operation. Through the hierarchical gating fusion, final saliency prediction is achieved at the finest scale. Extensive experimental analyses on three benchmark datasets demonstrate the superior performance of the proposed method.

中文翻译：

使用多尺度注意门控网络的视觉显着性预测

预测人类的视觉注意力不仅可以增进我们对基本生物学机制的理解，而且可以为其他与计算机视觉相关的任务（例如自动驾驶和人机交互）带来新的见解。当前基于深度学习的方法通常将重点放在高级语义特征上进行预测。但是，高级语义特征缺少精细的空间信息。理想情况下，显着性预测模型应同时包含空间和语义特征。在本文中，我们提出了一种多尺度的注意门控网络（我们称为MSAGNet），以融合具有不同空间分辨率的语义特征以进行视觉显着性预测。具体来说，我们以高分辨率网络（HRNet）为骨干，提取多尺度语义特征。多尺度注意门控模块设计为以分层方式自适应融合这些多尺度特征。与传统的从多层或多尺度输入进行级联的方式不同，此模块根据高级语义特征计算空间注意图，然后通过门控操作将其与低级空间特征融合。通过分层门控融合，可以在最佳规模上实现最终显着性预测。对三个基准数据集的大量实验分析证明了该方法的优越性能。该模块根据高级语义特征计算空间注意图，然后通过门操作将其与低级空间特征融合。通过分层门控融合，可以在最佳规模上实现最终显着性预测。对三个基准数据集的大量实验分析证明了该方法的优越性能。该模块根据高级语义特征计算空间注意图，然后通过门操作将其与低级空间特征融合。通过分层门控融合，可以在最佳规模上实现最终显着性预测。对三个基准数据集的大量实验分析证明了该方法的优越性能。

更新日期：2021-05-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>