当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multimodal sentiment analysis with unidirectional modality translation
Neurocomputing ( IF 6 ) Pub Date : 2021-09-20 , DOI: 10.1016/j.neucom.2021.09.041
Bo Yang 1 , Bo Shao 2 , Lijun Wu 2 , Xiaola Lin 1
Affiliation  

Multimodal Sentiment Analysis (MSA) is a challenging research area that investigates sentiment expressed from multiple heterogeneous sources of information. To integrate multimodal information including text, visual and audio modalities, state-of-the-art models focus on developing various fusion strategies, such as attention and outer product. However, the inferior quality of visual and audio features that is commonly observed in this area has not aroused much attention. We argue that this issue will obstruct the performance of the fusion strategies to a considerable extent. Therefore, in this paper, we propose Multimodal Translation for Sentiment Analysis (MTSA), a multimodal framework that improves the quality of visual and audio features by translating them to text features extracted by Bidirectional Encoder Representations from Transformers (BERT). Experiments on two benchmark datasets CMU-MOSI and CMU-MOSEI show that our model performs better than the state-of-the-art methods on both datasets across all the metrics, which illustrates the effectiveness of our method.



中文翻译:

单向模态翻译的多模态情感分析

多模态情感分析 (MSA) 是一个具有挑战性的研究领域,它调查从多个异构信息源表达的情感。为了整合包括文本、视觉和音频模态在内的多模态信息,最先进的模型专注于开发各种融合策略,例如注意力和外积。然而,在该领域普遍观察到的视觉和音频特征质量低劣并没有引起太多关注。我们认为这个问题将在相当程度上阻碍融合策略的性能。因此,在本文中,我们提出了情感分析的多模态翻译(MTSA),一种多模态框架,通过将视觉和音频特征转换为由 Transformers 的双向编码器表示 (BERT) 提取的文本特征来提高视觉和音频特征的质量。在两个基准数据集 CMU-MOSI 和 CMU-MOSEI 上的实验表明,我们的模型在所有指标上的两个数据集上的表现都优于最先进的方法,这说明了我们方法的有效性。

更新日期:2021-10-14
down
wechat
bug