当前位置: X-MOL 学术arXiv.cs.HC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
arXiv - CS - Human-Computer Interaction Pub Date : 2021-09-15 , DOI: arxiv-2109.07916
Bonaventure F. P. Dossou, Yeno K. S. Gbenou

Using mel-spectrograms over conventional MFCCs features, we assess the abilities of convolutional neural networks to accurately recognize and classify emotions from speech data. We introduce FSER, a speech emotion recognition model trained on four valid speech databases, achieving a high-classification accuracy of 95,05\%, over 8 different emotion classes: anger, anxiety, calm, disgust, happiness, neutral, sadness, surprise. On each benchmark dataset, FSER outperforms the best models introduced so far, achieving a state-of-the-art performance. We show that FSER stays reliable, independently of the language, sex identity, and any other external factor. Additionally, we describe how FSER could potentially be used to improve mental and emotional health care and how our analysis and findings serve as guidelines and benchmarks for further works in the same direction.

中文翻译:

FSER:用于语音情感识别的深度卷积神经网络

使用基于传统 MFCC 特征的梅尔谱图,我们评估了卷积神经网络从语音数据中准确识别和分类情绪的能力。我们引入了 FSER,一种在四个有效语音数据库上训练的语音情感识别模型,实现了 95,05\% 的高分类准确率,超过 8 种不同的情感类别:愤怒、焦虑、平静、厌恶、快乐、中立、悲伤、惊讶. 在每个基准数据集上,FSER 都优于迄今为止引入的最佳模型,实现了最先进的性能。我们表明 FSER 保持可靠,独立于语言、性别认同和任何其他外部因素。此外,
更新日期:2021-09-17
down
wechat
bug