Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 4-28-2022 , DOI: 10.1109/taffc.2022.3171091
Jianfei Yu ₁ , Kai Chen ₁ , Rui Xia ₁

Affiliation

Aspect-based multimodal sentiment analysis (ABMSA) aims to determine the sentiment polarities of each aspect or entity mentioned in a multimodal post or review. Previous studies to ABMSA can be summarized into two subtasks: aspect-term based multimodal sentiment classification (ATMSC) and aspect-category based multimodal sentiment classification (ACMSC). However, these existing studies have three shortcomings: (1) ignoring the object-level semantics in images; (2) primarily focusing on aspect-text and aspect-image interactions; (3) failing to consider the semantic gap between text and image representations. To tackle these issues, we propose a general Hierarchical Interactive Multimodal Transformer (HIMT) model for ABMSA. Specifically, we extract salient features with semantic concepts from images via an object detection method, and then propose a hierarchical interaction module to first model the aspect-text and aspect-image interactions, followed by capturing the text-image interactions. Moreover, an auxiliary reconstruction module is devised to largely eliminate the semantic gap between text and image representations. Experimental results show that our HIMT model significantly outperforms state-of-the-art methods on two benchmarks for ATMSC and one benchmark for ACMSC.

中文翻译：

用于基于方面的多模态情感分析的分层交互式多模态变压器

基于方面的多模态情感分析 (ABMSA) 旨在确定多模态帖子或评论中提到的每个方面或实体的情感极性。之前对 ABMSA 的研究可以概括为两个子任务：基于方面术语的多模态情感分类（ATMSC）和基于方面类别的多模态情感分类（ACMSC）。然而，这些现有的研究存在三个缺陷：（1）忽略了图像中的对象级语义； (2) 主要关注方面-文本和方面-图像交互； (3)没有考虑文本和图像表示之间的语义差距。为了解决这些问题，我们提出了 ABMSA 的通用分层交互式多模态变压器（HIMT）模型。具体来说，我们通过对象检测方法从图像中提取具有语义概念的显着特征，然后提出一个分层交互模块，首先对方面-文本和方面-图像交互进行建模，然后捕获文本-图像交互。此外，设计了辅助重建模块来很大程度上消除文本和图像表示之间的语义差距。实验结果表明，我们的 HIMT 模型在 ATMSC 的两个基准和 ACMSC 的一个基准上显着优于最先进的方法。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11