Hybrid Contrastive Learning of Tri-Modal Representation for Multimodal Sentiment Analysis,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hybrid Contrastive Learning of Tri-Modal Representation for Multimodal Sentiment Analysis
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2022-05-03 , DOI: 10.1109/taffc.2022.3172360
Sijie Mai ₁ , Ying Zeng ₂ , Shuangjia Zheng ₃ , Haifeng Hu ₁

Affiliation

The wide application of smart devices enables the availability of multimodal data, which can be utilized in many tasks. In the field of multimodal sentiment analysis, most previous works focus on exploring intra- and inter-modal interactions. However, training a network with cross-modal information (language, audio and visual) is still challenging due to the modality gap. Besides, while learning dynamics within each sample draws great attention, the learning of inter-sample and inter-class relationships is neglected. Moreover, the size of datasets limits the generalization ability of the models. To address the afore-mentioned issues, we propose a novel framework HyCon for hybrid contrastive learning of tri-modal representation. Specifically, we simultaneously perform intra-/inter-modal contrastive learning and semi-contrastive learning, with which the model can fully explore cross-modal interactions, learn inter-sample and inter-class relationships, and reduce the modality gap. Besides, refinement term and modality margin are introduced to enable a better learning of unimodal pairs. Moreover, we devise pair selection mechanism to identify and assign weights to the informative negative and positive pairs. HyCon can naturally generate many training pairs for better generalization and reduce the negative effect of limited datasets. Extensive experiments demonstrate that our method outperforms baselines on multimodal sentiment analysis and emotion recognition.

中文翻译：

用于多模态情感分析的三模态表示的混合对比学习

智能设备的广泛应用使得多模式数据变得可用，可用于许多任务。在多模态情感分析领域，之前的大多数工作都集中在探索模态内和模间交互。然而，由于模态差距，训练具有跨模态信息（语言、音频和视觉）的网络仍然具有挑战性。此外，虽然每个样本内的学习动态引起了极大的关注，但样本间和类间关系的学习却被忽视了。此外，数据集的大小限制了模型的泛化能力。为了解决上述问题，我们提出了一种新的框架 HyCon，用于三模态表示的混合对比学习。具体来说，我们同时进行模内/模间对比学习和半对比学习，模型可以充分探索跨模态交互，学习样本间和类间关系，并减少模态差距。此外，还引入了细化项和模态裕度，以便更好地学习单峰对。此外，我们设计了对选择机制来识别信息丰富的负对和正对并为其分配权重。HyCon 可以自然地生成许多训练对，以实现更好的泛化，并减少有限数据集的负面影响。大量的实验表明，我们的方法在多模态情感分析和情感识别方面优于基线。引入细化项和模态裕度是为了更好地学习单峰对。此外，我们设计了对选择机制来识别信息丰富的负对和正对并为其分配权重。HyCon 可以自然地生成许多训练对，以实现更好的泛化，并减少有限数据集的负面影响。大量的实验表明，我们的方法在多模态情感分析和情感识别方面优于基线。引入细化项和模态裕度是为了更好地学习单峰对。此外，我们设计了对选择机制来识别信息丰富的负对和正对并为其分配权重。HyCon 可以自然地生成许多训练对，以实现更好的泛化，并减少有限数据集的负面影响。大量的实验表明，我们的方法在多模态情感分析和情感识别方面优于基线。

更新日期：2022-05-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11