Exploiting Multi-CNN Features in CNN-RNN Based Dimensional Emotion Recognition on the OMG in-the-Wild Dataset,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploiting Multi-CNN Features in CNN-RNN Based Dimensional Emotion Recognition on the OMG in-the-Wild Dataset
IEEE Transactions on Affective Computing ( IF 11.2 ) Pub Date : 2020-08-04 , DOI: 10.1109/taffc.2020.3014171
Dimitrios Kollias , Stefanos P. Zafeiriou

This article presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN component and are exploited by RNN subnets in a multi-task framework. Their outputs constitute an intermediate level prediction; final estimates are obtained as the mean or median values of these predictions. Fusion of the networks is also examined for boosting the obtained performance, at Decision-, or at Model-level; in the latter case a RNN was used for the fusion. Our approach, although using only the visual modality, outperformed state-of-the-art methods that utilized audio and visual modalities. Some of our developments have been submitted to the OMG-Emotion Challenge, ranking second among the technologies which used only visual information for valence estimation; ranking third overall. Through extensive experimentation, we further show that arousal estimation is greatly improved when low-level features are combined with high-level ones.

中文翻译：

在 OMG in-the-Wild 数据集上利用基于 CNN-RNN 的维度情感识别中的多 CNN 特征

本文提出了一种新颖的基于 CNN-RNN 的方法，该方法利用 One-Minute Gradual-Emotion (OMG-Emotion) 数据集，利用多个 CNN 特征在野外进行维度情感识别。我们的方法包括首先使用相关且规模较大的 Aff-Wild 和 Aff-Wild2 情感数据库进行预训练。低级、中级和高级特征是从经过训练的 CNN 组件中提取的，并由多任务框架中的 RNN 子网利用。他们的输出构成了一个中间级别的预测；最终估计值作为这些预测的平均值或中值获得。还检查了网络的融合，以在决策级或模型级提高获得的性能；在后一种情况下，使用 RNN 进行融合。我们的方法，虽然只使用视觉模式，优于使用音频和视觉模式的最先进方法。我们的一些开发成果已提交给 OMG-Emotion Challenge，在仅使用视觉信息进行价态估计的技术中排名第二；综合排名第三。通过大量实验，我们进一步表明，当低级特征与高级特征相结合时，唤醒估计得到了极大的改善。

更新日期：2020-08-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>