Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features,Journal of Intelligent & Fuzzy Systems

当前位置： X-MOL 学术 › J. Intell. Fuzzy Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features
Journal of Intelligent & Fuzzy Systems ( IF 2 ) Pub Date : 2020-07-06 , DOI: 10.3233/jifs-179963
Ningning Yang ₁ , Nilanjan Dey ₂ , R. Simon Sherratt ₃ , Fuqian Shi ₁

Affiliation

Speech Emotion Recognition (SER) has been widely used in many fields, such as smart home assistants commonly found in the market. Smart home assistants that could detect the user’s emotion would improve the communication between a user and the assistant enabling the assistant to offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable algorithm considering performance verses complexity for deployment in smart home devices. The four emotional speech sets were selected from the Berlin Emotional Database (EMO-DB) as experimental data, 26 MFCC features were extracted from each type of emotional speech to identify the emotions of happiness, anger, sadness and neutrality. Then, speaker-independent experiments for our Speech emotion Recognition (SER) were conducted by using the Back Propagation Neural Network (BPNN), Extreme Learning Machine (ELM), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM). Synthesizing the recognition accuracy and processing time, this work shows that the performance of SVM was the best among the four methods as a good candidate to be deployed for SER in smart home devices. SVM achieved an overall accuracy of 92.4% while offering low computational requirements when training and testing. We conclude that the MFCC features and the SVM classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion.

中文翻译：

通过使用mel频率倒谱系数特征的机器学习技术识别语音中的基本情绪状态

语音情感识别（SER）已广泛应用于许多领域，例如市场上常见的智能家居助手。可以检测用户情绪的智能家居助手将改善用户与助手之间的通信，使助手可以提供更有成效的反馈。因此，这项工作的目的是分析语音中的情绪状态，并提出一种考虑性能和复杂性的合适算法，以在智能家居设备中进行部署。从柏林情感数据库（EMO-DB）中选择了四个情感语音集作为实验数据，从每种情感语音中提取了26种MFCC特征，以识别幸福，愤怒，悲伤和中立的情感。然后，我们使用反向传播神经网络（BPNN），极限学习机（ELM），概率神经网络（PNN）和支持向量机（SVM）对我们的语音情感识别（SER）进行了与说话者无关的实验。综合识别精度和处理时间，这项工作表明，SVM的性能是这四种方法中最好的，是在智能家居设备中部署SER的良好选择。SVM的总体准确度达到92.4％，同时在培训和测试时对计算的要求较低。我们得出结论，独立于说话者的实验中使用的MFCC功能和SVM分类模型在情绪的自动预测中非常有效。综合识别精度和处理时间，这项工作表明，SVM的性能是这四种方法中最好的，是在智能家居设备中部署SER的良好选择。SVM的总体准确度达到92.4％，同时在培训和测试时对计算的要求较低。我们得出结论，独立于说话者的实验中使用的MFCC功能和SVM分类模型在情绪的自动预测中非常有效。综合识别精度和处理时间，这项工作表明，SVM的性能是这四种方法中最好的，是在智能家居设备中部署SER的良好选择。SVM的总体准确度达到92.4％，同时在培训和测试时对计算的要求较低。我们得出结论，独立于说话者的实验中使用的MFCC功能和SVM分类模型在情绪的自动预测中非常有效。

更新日期：2020-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>