Articulation constrained learning with application to speech emotion recognition,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Articulation constrained learning with application to speech emotion recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-08-20 , DOI: 10.1186/s13636-019-0157-9
Mohit Shah ₁ , Ming Tu ₂ , Visar Berisha _{1,

2} , Chaitali Chakrabarti ₁ , Andreas Spanias ₁

Affiliation

Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ1-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/, /AE/, /IY/, /UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.

中文翻译：

应用到语音情感识别的清晰度约束学习

之前已经证明将发音信息与声学特征相结合的语音情感识别方法可以提高识别性能。在许多情况下，大规模收集发音数据可能不可行，从而限制了此类方法的范围和适用性。在本文中，提出了一种使用发音和声学信息进行情感识别的判别学习方法。传统的 ℓ1 正则化逻辑回归成本函数被扩展为包括强制模型重建关节数据的附加约束。这导致同时为两个任务联合优化的稀疏和可解释的表示。此外，该模型在训练期间只需要发音特征；仅需要语音特征来推断样本外数据。进行实验以评估元音/AA/、/AE/、/IY/、/UW/和完整话语的情绪识别性能。合并发音信息被证明可以显着提高基于价态分类的性能。语料库内和跨语料库分类情绪识别的结果表明，所提出的方法在区分快乐与其他情绪方面更有效。

更新日期：2019-08-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文