Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection
Information Fusion ( IF 18.6 ) Pub Date : 2024-03-13 , DOI: 10.1016/j.inffus.2024.102353
Hao Liu , Runguo Wei , Geng Tu , Jiali Lin , Cheng Liu , Dazhi Jiang

Sarcasm is a form of sentiment expression that highlights the disparity between a person’s true intentions and the content they explicitly present. With the exponential increase in multimodal data on social platforms, the detection of sarcasm across various modes has become a pivotal area of research. Although previous studies have extensively examined multimodal feature extraction, fusion, and the modeling of inter-modal incongruities, they often neglected the subtle sentiment cues inherent in sarcastic multimodal data. Additionally, they did not adequately address the sparse distribution and tenuous connections between sarcastic features both within and cross modalities. To address these gaps, we introduce a hierarchical fusion model that integrates sentiment information for enhanced multimodal sarcasm detection. Specifically, we use attribute-object matching in the image modality, treating it as an auxiliary attribute modality. Sentiment data is then extracted from each modality and combined to achieve a more comprehensive representation within modalities. Moreover, we characterize the relationships of inter-modal incongruities using a crossmodal Transformer. We also implement a sentiment-aware image-text contrastive loss mechanism to synchronize the semantics of images and text better. By intensifying these alignments, our model is better equipped to understand incongruous relationships. Experiments demonstrate that our hierarchical fusion model achieves state-of-the-art performance on the multimodal sarcasm detection task.

中文翻译：

由情感驱动的讽刺：用于多模式讽刺检测的情感感知分层融合网络

讽刺是一种情感表达形式，突出了一个人的真实意图与他们明确呈现的内容之间的差异。随着社交平台上多模式数据的指数级增长，各种模式下的讽刺检测已成为一个关键的研究领域。尽管以前的研究广泛地研究了多模态特征提取、融合以及模态间不协调的建模，但他们经常忽略讽刺性多模态数据中固有的微妙情感线索。此外，他们没有充分解决模态内部和跨模态讽刺特征之间的稀疏分布和脆弱联系。为了解决这些差距，我们引入了一种分层融合模型，该模型集成了情感信息以增强多模式讽刺检测。具体来说，我们在图像模态中使用属性-对象匹配，将其视为辅助属性模态。然后从每种模态中提取情感数据并进行组合，以在模态中实现更全面的表示。此外，我们使用跨模态 Transformer 来表征模态间不协调的关系。我们还实现了一种情感感知的图像文本对比损失机制，以更好地同步图像和文本的语义。通过加强这些对齐，我们的模型能够更好地理解不协调的关系。实验表明，我们的分层融合模型在多模式讽刺检测任务上实现了最先进的性能。

更新日期：2024-03-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>