Visual and Affective Multimodal Models of Word Meaning in Language and Mind,Cognitive Science

当前位置： X-MOL 学术 › Cognitive Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Visual and Affective Multimodal Models of Word Meaning in Language and Mind
Cognitive Science ( IF 2.3 ) Pub Date : 2021-01-11 , DOI: 10.1111/cogs.12922
Simon De Deyne ₁ , Danielle J Navarro ₂ , Guillem Collell ₃ , Andrew Perfors ₁

Affiliation

One of the main limitations of natural language‐based approaches to meaning is that they do not incorporate multimodal representations the way humans do. In this study, we evaluate how well different kinds of models account for people's representations of both concrete and abstract concepts. The models we compare include unimodal distributional linguistic models as well as multimodal models which combine linguistic with perceptual or affective information. There are two types of linguistic models: those based on text corpora and those derived from word association data. We present two new studies and a reanalysis of a series of previous studies. The studies demonstrate that both visual and affective multimodal models better capture behavior that reflects human representations than unimodal linguistic models. The size of the multimodal advantage depends on the nature of semantic representations involved, and it is especially pronounced for basic‐level concepts that belong to the same superordinate category. Additional visual and affective features improve the accuracy of linguistic models based on text corpora more than those based on word associations; this suggests systematic qualitative differences between what information is encoded in natural language versus what information is reflected in word associations. Altogether, our work presents new evidence that multimodal information is important for capturing both abstract and concrete words and that fully representing word meaning requires more than purely linguistic information. Implications for both embodied and distributional views of semantic representation are discussed.

中文翻译：

语言和思维中词义的视觉和情感多模态模型

基于自然语言的意义方法的主要局限性之一是它们不像人类那样包含多模态表示。在这项研究中，我们评估了不同类型的模型如何很好地解释人们对具体和抽象概念的表示。我们比较的模型包括单峰分布语言模型以及将语言与感知或情感信息相结合的多峰模型。有两种类型的语言模型：基于文本语料库的模型和源自单词关联数据的模型。我们介绍了两项新研究以及对一系列先前研究的重新分析。研究表明，视觉和情感多模态模型比单模态语言模型更能捕捉反映人类表征的行为。多模态优势的大小取决于所涉及的语义表示的性质，对于属于同一上级类别的基本概念尤其明显。额外的视觉和情感特征比基于词关联的语言模型更能提高基于文本语料库的语言模型的准确性；这表明在自然语言中编码的信息与词关联中反映的信息之间存在系统的质的差异。总而言之，我们的工作提供了新的证据，表明多模态信息对于捕获抽象和具体的单词都很重要，并且完全表示词义需要的不仅仅是纯粹的语言信息。讨论了语义表示的具体化和分布式视图的含义。

更新日期：2021-01-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文