Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis
Information Processing & Management ( IF 7.4 ) Pub Date : 2022-08-02 , DOI: 10.1016/j.ipm.2022.103038
Li Yang , Jin-Cheon Na , Jianfei Yu

As an emerging task in opinion mining, End-to-End Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract all the aspect-sentiment pairs mentioned in a pair of sentence and image. Most existing methods of MABSA do not explicitly incorporate aspect and sentiment information in their textual and visual representations and fail to consider the different contributions of visual representations to each word or aspect in the text. To tackle these limitations, we propose a multi-task learning framework named Cross-Modal Multitask Transformer (CMMT), which incorporates two auxiliary tasks to learn the aspect/sentiment-aware intra-modal representations and introduces a Text-Guided Cross-Modal Interaction Module to dynamically control the contributions of the visual information to the representation of each word in the inter-modal interaction. Experimental results demonstrate that CMMT consistently outperforms the state-of-the-art approach JML by 3.1, 3.3, and 4.1 absolute percentage points on three Twitter datasets for the End-to-End MABSA task, respectively. Moreover, further analysis shows that CMMT is superior to comparison systems in both aspect extraction (AE) and sentiment classification (SC), which would move the development of multimodal AE and SC algorithms forward with improved performance.

中文翻译：

用于端到端多模式基于方面的情感分析的跨模式多任务转换器

作为意见挖掘中的一项新兴任务，端到端多模态基于方面的情感分析（MABSA）旨在提取一对句子和图像中提到的所有方面-情感对。大多数现有的 MABSA 方法没有在其文本和视觉表示中明确地包含方面和情感信息，并且未能考虑视觉表示对文本中每个单词或方面的不同贡献。为了解决这些限制，我们提出了一个名为 Cross-Modal Multitask Transformer (CMMT) 的多任务学习框架，它结合了两个辅助任务来学习方面/情感感知的模态内表示，并引入了文本引导的跨模态交互用于动态控制视觉信息对模式间交互中每个单词表示的贡献的模块。实验结果表明，在端到端 MABSA 任务的三个 Twitter 数据集上，CMMT 始终比最先进的方法 JML 分别高出 3.1、3.3 和 4.1 个绝对百分点。此外，进一步的分析表明，CMMT 在方面提取 (AE) 和情感分类 (SC) 方面都优于比较系统，这将推动多模态 AE 和 SC 算法的发展，并提高性能。

更新日期：2022-08-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11