Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the wild,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the wild
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2018-01-01 , DOI: 10.1109/taffc.2018.2858255
Anderson Raymundo Avila , Zahid Akhtar Momin , Joao Felipe Santos , Douglas OShaughnessy , Tiago Henrique Falk

Interest in affective computing is burgeoning, in great part due to its role in emerging affective human-computer interfaces (HCI). To date, the majority of existing research on automated emotion analysis has relied on data collected in controlled environments. With the rise of HCI applications on mobile devices, however, so-called “in-the-wild” settings have posed a serious threat for emotion recognition systems, particularly those based on voice. In this case, environmental factors such as ambient noise and reverberation severely hamper system performance. In this paper, we quantify the detrimental effects that the environment has on emotion recognition and explore the benefits achievable with speech enhancement. Moreover, we propose a modulation spectral feature pooling scheme that is shown to outperform a state-of-the-art benchmark system for environment-robust prediction of spontaneous arousal and valence emotional primitives. Experiments on an environment-corrupted version of the RECOLA dataset of spontaneous interactions show the proposed feature pooling scheme, combined with speech enhancement, outperforming the benchmark across different noise-only, reverberation-only and noise-plus-reverberation conditions. Additional tests with the SEWA database show the benefits of the proposed method for in-the-wild applications.

中文翻译：

用于改进野外语音情感识别的调制频谱特征的特征池

对情感计算的兴趣正在迅速增长，这在很大程度上是由于其在新兴的情感人机界面 (HCI) 中的作用。迄今为止，大多数关于自动化情绪分析的现有研究都依赖于在受控环境中收集的数据。然而，随着移动设备上 HCI 应用程序的兴起，所谓的“野外”设置对情绪识别系统构成了严重威胁，尤其是那些基于语音的系统。在这种情况下，环境噪声和混响等环境因素会严重影响系统性能。在本文中，我们量化了环境对情绪识别的不利影响，并探索了语音增强可实现的好处。而且，我们提出了一种调制光谱特征池方案，该方案被证明优于最先进的基准系统，用于自发性唤醒和价情绪基元的环境稳健预测。在自发交互的 RECOLA 数据集的环境损坏版本上进行的实验表明，所提出的特征池方案与语音增强相结合，在不同的仅噪声、仅混响和噪声加混响条件下的性能优于基准。对 SEWA 数据库的其他测试显示了所提出的方法对野外应用程序的好处。结合语音增强，在不同的纯噪声、纯混响和噪声加混响条件下优于基准测试。对 SEWA 数据库的其他测试显示了所提出的方法对野外应用程序的好处。结合语音增强，在不同的纯噪声、纯混响和噪声加混响条件下优于基准测试。对 SEWA 数据库的其他测试显示了所提出的方法对野外应用程序的好处。

更新日期：2018-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11