Various syncretic co‐attention network for multimodal sentiment analysis,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Various syncretic co‐attention network for multimodal sentiment analysis
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2020-07-22 , DOI: 10.1002/cpe.5954
Meng Cao ₁ , Yonghua Zhu ₁ , Wenjing Gao ₁ , Mengyao Li ₁ , Shaoxiu Wang ₁

Affiliation

The multimedia contents shared on social network reveal public sentimental attitudes toward specific events. Therefore, it is necessary to conduct sentiment analysis automatically on abundant multimedia data posted by the public for real‐world applications. However, approaches to single‐modal sentiment analysis neglect the internal connections between textual and visual contents, and current multimodal methods fail to exploit the multilevel semantic relations of heterogeneous features. In this article, the various syncretic co‐attention network is proposed to excavate the intricate multilevel corresponding relations between multimodal data, and combine the unique information of each modality for integrated complementary sentiment classification. Specifically, a multilevel co‐attention module is constructed to explore localized correspondences between each image region and each text word, and holistic correspondences between global visual information and context‐based textual semantics. Then, all the single‐modal features can be fused from different levels, respectively. Except for fused multimodal features, our proposed VSCN also considers unique information of each modality simultaneously and integrates them into an end‐to‐end framework for sentiment analysis. The superior results of experiments on three constructed real‐world datasets and a benchmark dataset of Visual Sentiment Ontology (VSO) prove the effectiveness of our proposed VSCN. Especially qualitative analyses are given for deep explaining of our method.

中文翻译：

用于多模态情感分析的各种融合共同注意网络

社交网络上共享的多媒体内容揭示了公众对特定事件的情感态度。因此，有必要对公众发布的大量多媒体数据进行自动情感分析，以用于现实世界的应用。然而，单模态情感分析方法忽略了文本和视觉内容之间的内在联系，当前的多模态方法未能利用异构特征的多级语义关系。在本文中，提出了各种融合共同注意网络来挖掘多模态数据之间复杂的多级对应关系，并结合每种模态的独特信息进行集成互补情感分类。具体来说，构建了一个多级共同注意模块来探索每个图像区域和每个文本词之间的局部对应关系，以及全局视觉信息和基于上下文的文本语义之间的整体对应关系。然后，可以分别从不同层次融合所有单模态特征。除了融合的多模态特征外，我们提出的 VSCN 还同时考虑了每种模态的独特信息，并将它们集成到端到端框架中进行情感分析。在三个构建的真实世界数据集和一个视觉情感本体（VSO）基准数据集上的实验结果证明了我们提出的 VSCN 的有效性。特别是定性分析，以深入解释我们的方法。全局视觉信息和基于上下文的文本语义之间的整体对应关系。然后，可以分别从不同层次融合所有单模态特征。除了融合的多模态特征外，我们提出的 VSCN 还同时考虑了每种模态的独特信息，并将它们集成到端到端框架中进行情感分析。在三个构建的真实世界数据集和一个视觉情感本体（VSO）基准数据集上的实验结果证明了我们提出的 VSCN 的有效性。特别是定性分析，以深入解释我们的方法。全局视觉信息和基于上下文的文本语义之间的整体对应关系。然后，可以分别从不同层次融合所有单模态特征。除了融合的多模态特征外，我们提出的 VSCN 还同时考虑了每种模态的独特信息，并将它们集成到端到端框架中进行情感分析。在三个构建的真实世界数据集和一个视觉情感本体（VSO）基准数据集上的实验结果证明了我们提出的 VSCN 的有效性。特别是定性分析，以深入解释我们的方法。我们提出的 VSCN 还同时考虑了每种模态的独特信息，并将它们集成到端到端的情感分析框架中。在三个构建的真实世界数据集和一个视觉情感本体（VSO）基准数据集上的实验结果证明了我们提出的 VSCN 的有效性。特别是定性分析，以深入解释我们的方法。我们提出的 VSCN 还同时考虑了每种模态的独特信息，并将它们集成到端到端的情感分析框架中。在三个构建的真实世界数据集和一个视觉情感本体（VSO）基准数据集上的实验结果证明了我们提出的 VSCN 的有效性。特别是定性分析，以深入解释我们的方法。

更新日期：2020-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>