Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
arXiv - CS - Multimedia Pub Date : 2020-09-05 , DOI: arxiv-2009.02598
Jingjun Liang, Ruichen Li and Qin Jin

Automatic emotion recognition is an active research topic with wide range of applications. Due to the high manual annotation cost and inevitable label ambiguity, the development of emotion recognition dataset is limited in both scale and quality. Therefore, one of the key challenges is how to build effective models with limited data resource. Previous works have explored different approaches to tackle this challenge including data enhancement, transfer learning, and semi-supervised learning etc. However, the weakness of these existing approaches includes such as training instability, large performance loss during transfer, or marginal improvement. In this work, we propose a novel semi-supervised multi-modal emotion recognition model based on cross-modality distribution matching, which leverages abundant unlabeled data to enhance the model training under the assumption that the inner emotional status is consistent at the utterance level across modalities. We conduct extensive experiments to evaluate the proposed model on two benchmark datasets, IEMOCAP and MELD. The experiment results prove that the proposed semi-supervised learning model can effectively utilize unlabeled data and combine multi-modalities to boost the emotion recognition performance, which outperforms other state-of-the-art approaches under the same condition. The proposed model also achieves competitive capacity compared with existing approaches which take advantage of additional auxiliary information such as speaker and interaction context.

中文翻译：

具有跨模态分布匹配的半监督多模态情感识别

自动情感识别是一个活跃的研究课题，具有广泛的应用。由于人工标注成本高昂和不可避免的标签歧义，情感识别数据集的发展在规模和质量上都受到限制。因此，关键挑战之一是如何在有限的数据资源下构建有效的模型。以前的工作探索了解决这一挑战的不同方法，包括数据增强、迁移学习和半监督学习等。然而，这些现有方法的弱点包括训练不稳定、迁移过程中性能损失大或边缘改进。在这项工作中，我们提出了一种基于跨模态分布匹配的新型半监督多模态情感识别模型，它利用丰富的未标记数据来增强模型训练，前提是假设内在情绪状态在跨模态的话语级别是一致的。我们进行了广泛的实验，以在两个基准数据集 IEMOCAP 和 MELD 上评估所提出的模型。实验结果证明，所提出的半监督学习模型可以有效地利用未标记数据并结合多模态来提高情感识别性能，在相同条件下优于其他最先进的方法。与利用额外辅助信息（例如说话者和交互上下文）的现有方法相比，所提出的模型还实现了竞争能力。我们进行了广泛的实验，以在两个基准数据集 IEMOCAP 和 MELD 上评估所提出的模型。实验结果证明，所提出的半监督学习模型可以有效地利用未标记的数据并结合多模态来提高情感识别性能，在相同条件下优于其他最先进的方法。与利用额外辅助信息（例如说话者和交互上下文）的现有方法相比，所提出的模型还实现了竞争能力。我们进行了广泛的实验，以在两个基准数据集 IEMOCAP 和 MELD 上评估所提出的模型。实验结果证明，所提出的半监督学习模型可以有效地利用未标记的数据并结合多模态来提高情感识别性能，在相同条件下优于其他最先进的方法。与利用额外辅助信息（例如说话者和交互上下文）的现有方法相比，所提出的模型还实现了竞争能力。实验结果证明，所提出的半监督学习模型可以有效地利用未标记的数据并结合多模态来提高情感识别性能，在相同条件下优于其他最先进的方法。与利用额外辅助信息（例如说话者和交互上下文）的现有方法相比，所提出的模型还实现了竞争能力。实验结果证明，所提出的半监督学习模型可以有效地利用未标记的数据并结合多模态来提高情感识别性能，在相同条件下优于其他最先进的方法。与利用额外辅助信息（例如说话者和交互上下文）的现有方法相比，所提出的模型还实现了竞争能力。

更新日期：2020-09-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>