Metric Learning-Based Multimodal Audio-Visual Emotion Recognition,IEEE Multimedia

当前位置： X-MOL 学术 › IEEE Multimed. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Metric Learning-Based Multimodal Audio-Visual Emotion Recognition
IEEE Multimedia ( IF 2.3 ) Pub Date : 2019-12-17 , DOI: 10.1109/mmul.2019.2960219
Esam Ghaleb , Mirela Popa , Stylianos Asteriadis

People express their emotions through multiple channels, such as visual and audio ones. Consequently, automatic emotion recognition can be significantly benefited by multimodal learning. Even-though each modality exhibits unique characteristics; multimodal learning takes advantage of the complementary information of diverse modalities when measuring the same instance, resulting in enhanced understanding of emotions. Yet, their dependencies and relations are not fully exploited in audio–video emotion recognition. Furthermore, learning an effective metric through multimodality is a crucial goal for many applications in machine learning. Therefore, in this article, we propose multimodal emotion recognition metric learning (MERML), learned jointly to obtain a discriminative score and a robust representation in a latent-space for both modalities. The learned metric is efficiently used through the radial basis function (RBF) based support vector machine (SVM) kernel. The evaluation of our framework shows a significant performance, improving the state-of-the-art results on the eNTERFACE and CREMA-D datasets.

中文翻译：

基于度量学习的多模式视听情感识别

人们通过视觉和听觉等多种渠道表达情感。因此，多模式学习可以极大地受益于自动情感识别。尽管每种模式都具有独特的特征；多模态学习在测量同一实例时会利用多种模态的补充信息，从而增强了对情感的理解。但是，它们的依存关系在音频视频情感识别中并未得到充分利用。此外，通过多模式学习有效的度量标准是机器学习中许多应用程序的关键目标。因此，在本文中，我们提出了多模态情感识别度量学习（MERML），共同学习以获得两种模态在潜在空间中的判别分数和鲁棒表示。通过基于径向基函数（RBF）的支持向量机（SVM）内核可以有效地使用学习的度量。对我们框架的评估显示出显着的性能，改善了eNTERFACE和CREMA-D数据集的最新结果。

更新日期：2020-04-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11