当前位置: X-MOL 学术Cognition › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations
Cognition ( IF 2.8 ) Pub Date : 2020-10-20 , DOI: 10.1016/j.cognition.2020.104465
Marek A Pedziwiatr 1 , Matthias Kümmerer 2 , Thomas S A Wallis 2 , Matthias Bethge 2 , Christoph Teufel 1
Affiliation  

Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.



中文翻译:

基于深度卷积神经网络的意义图和显着性模型在预测人类注视时对图像意义不敏感

眼球运动对人类视觉至关重要,因此了解观察者如何决定看哪里很重要。意义图 (MM) 是一种捕获图像中语义信息分布的技术,最近被提出来支持意义而非图像特征引导人类注视的假设。MM 有潜力成为远远超出眼动研究的重要工具。在这里,我们检查了 MM 的核心假设。首先,我们将 MM 在预测注视方面的性能与显着性模型进行了比较,结果表明 DeepGaze II(一种经过训练以基于高级特征而非意义预测注视的深度神经网络)优于 MM。其次,我们表明,虽然人类观察者对操纵对象上下文关系引起的意义变化做出反应,但 MM 和 DeepGaze II 不会。

更新日期:2020-10-30
down
wechat
bug