Exploiting object features in deep gaze prediction models,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploiting object features in deep gaze prediction models
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2020-10-02 , DOI: 10.1016/j.jvcir.2020.102931
Samad Zabihi , Eghbal Mansoori , Mehran Yazdi

The human visual system analyzes the complex scenes rapidly. It devotes the limited perceptual resources to the most salient subsets and/or objects of scenes while ignoring their less salient parts. Gaze prediction models try to predict the human eye fixations (human gaze) under free-viewing conditions while imitating the attentive mechanism. Previous studies on saliency benchmark datasets have shown that visual attention is affected by the salient objects of the scenes and their features. These features include the identity, the location, and the visual features of objects in the scenes, beside to the context of the input image. Moreover, the human eye fixations often converge to the specific parts of salient objects in the scenes. In this paper, we propose a deep gaze prediction model using object detection via image segmentation. It uses some deep neural modules to find the identity, location, and visual features of the salient objects in the scenes. In addition, we introduce a deep module to capture the prior bias of human eye fixations. To evaluate our model, several challenging saliency benchmark datasets are used in the experiments. We also conduct an ablation study to show the effectiveness of our proposed modules and its architecture. Despite its fewer parameters, our model has comparable, or even better performance on some datasets, to the state-of-the-art saliency models.

中文翻译：

在深视线预测模型中利用对象特征

人类视觉系统可以快速分析复杂的场景。它将有限的感知资源专用于场景的最显着子集和/或对象，而忽略了它们的不显着部分。注视预测模型试图在自由观察条件下预测人眼注视（人注视），同时模仿注意机制。先前对显着性基准数据集的研究表明，视觉注意力受场景及其特征的显着对象的影响。除了输入图像的上下文之外，这些功能还包括场景中对象的身份，位置和视觉功能。此外，人眼注视通常会融合到场景中显着物体的特定部分。在本文中，我们提出了一种通过图像分割进行目标检测的深度注视预测模型。它使用一些深层神经模块来查找场景中显着对象的身份，位置和视觉特征。此外，我们引入了一个深层模块来捕获人眼注视的先验偏差。为了评估我们的模型，实验中使用了几个具有挑战性的显着性基准数据集。我们还进行了消融研究，以显示我们提出的模块及其架构的有效性。尽管参数较少，我们的模型在某些数据集上却具有与最新的显着性模型相当甚至更好的性能。我们还进行了消融研究，以显示我们提出的模块及其架构的有效性。尽管参数较少，我们的模型在某些数据集上却具有与最新的显着性模型相当甚至更好的性能。我们还进行了消融研究，以显示我们提出的模块及其架构的有效性。尽管参数较少，我们的模型在某些数据集上却具有与最新的显着性模型相当甚至更好的性能。

更新日期：2020-11-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11