当前位置: X-MOL 学术IEEE Trans. Affect. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Investigation of Partition-based and Phonetically-aware Acoustic Features for Continuous Emotion Prediction from Speech
IEEE Transactions on Affective Computing ( IF 11.2 ) Pub Date : 2020-10-01 , DOI: 10.1109/taffc.2018.2821135
Zhaocheng Huang , Julien Epps

Phonetic variability has long been considered a confounding factor for emotional speech processing, so phonetic features have been rarely explored. However, surprisingly some features with purely phonetic information have shown state-of-the-art performance for continuous prediction of emotions (e.g., arousal and valence), for which the underlying causes are unknown to date. In this article, we present in-depth investigations into phonetic features on three widely used corpora - RECOLA, SEMAINE and USC CreativeIT - to explore this from two perspectives: acoustic space partitioning information and phonetic content. First, comparisons of multiple different partitioning methods confirm the significance of partitioning information in speech, and reveal the new understanding that varying the number of partitions has a greater effect on valence than arousal prediction: a detailed representation of the acoustic space is needed for valence, whilst a general one is adequate for arousal. Second, phoneme-specific examination of phonetic features suggests that phonetic content is less emotionally informative than partitioning information, and is more important for arousal than for valence. Furthermore, we propose a novel set of phonetically-aware acoustic features, attaining significant improvements for valence (in particular) and arousal prediction across RECOLA, SEMAINE and CreativeIT respectively, compared with conventional reference acoustic features.

中文翻译:

基于分区和语音感知的语音连续情感预测声学特征研究

长期以来,语音变异一直被认为是情感语音处理的混杂因素,因此很少探索语音特征。然而,令人惊讶的是,一些具有纯语音信息的特征在连续预测情绪(例如,唤醒和效价)方面表现出最先进的性能,其根本原因至今未知。在本文中,我们对三个广泛使用的语料库(RECOLA、SEMAINE 和 USC CreativeIT)的语音特征进行了深入研究,从声学空间划分信息和语音内容两个角度进行了探讨。首先,通过对多种不同划分方法的比较,证实了划分信息在语音中的重要性,并揭示了新的理解,即改变分区的数量对效价的影响比唤醒预测更大:效价需要声学空间的详细表示,而一般的表示足以唤醒。其次,对语音特征的特定音素检查表明,语音内容在情感上的信息量不如划分信息,对唤醒比对效价更重要。此外,我们提出了一组新的语音感知声学特征,与传统的参考声学特征相比,分别在 RECOLA、SEMAINE 和 CreativeIT 上实现了价(特别是)和唤醒预测的显着改进。对语音特征的特定音素检查表明,语音内容在情感上的信息量不如分区信息,对唤醒比对效价更重要。此外,我们提出了一组新的语音感知声学特征,与传统的参考声学特征相比,分别在 RECOLA、SEMAINE 和 CreativeIT 上实现了价(特别是)和唤醒预测的显着改进。对语音特征的特定音素检查表明,语音内容比分割信息在情感上提供的信息更少,并且对唤醒比对效价更重要。此外,我们提出了一组新的语音感知声学特征,与传统的参考声学特征相比,分别在 RECOLA、SEMAINE 和 CreativeIT 上实现了价(特别是)和唤醒预测的显着改进。
更新日期:2020-10-01
down
wechat
bug