当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improved speech emotion recognition with Mel frequency magnitude coefficient
Applied Acoustics ( IF 3.4 ) Pub Date : 2021-03-26 , DOI: 10.1016/j.apacoust.2021.108046
J. Ancilin , A. Milton

Automatic speech emotion recognition using machine learning is a demanding research topic in the field of affective computing. Identifying the speech features for speech emotion recognition is a challenging issue as the feature needs to emphasize the information about emotion from the speech. Spectral features play a vital role in emotion recognition from speech signals. In this paper, two modifications are made in the extraction of Mel frequency cepstral coefficient, they are, using magnitude spectrum instead of energy spectrum and exclusion of discrete cosine transform and extract Mel Frequency Magnitude Coefficient. Mel frequency magnitude coefficient is the log of magnitude spectrum on a non-linear Mel scale frequency. Mel frequency magnitude coefficient and three conventional spectral features, Mel frequency cepstral coefficient, log frequency power coefficient and linear prediction cepstral coefficient are tested on Berlin, Ravdess, Savee, EMOVO, eNTERFACE and Urdu databases with multiclass support vector machine as the classifier. Mel frequency magnitude coefficient as a stand alone feature recognizes emotion with an accuracy of 81.50% for Berlin, 64.31% for Ravdess, 75.63% for Savee, 73.30% for EMOVO, 56.41% for eNTERFACE and 95.25% for Urdu databases. Mel frequency magnitude coefficient is found to be the better spectral feature for the identification of emotion from speech compared to the conventional features.



中文翻译:

利用梅尔频率幅度系数改善语音情感识别

使用机器学习的自动语音情感识别是情感计算领域中一个迫切的研究主题。识别语音特征以进行语音情感识别是一个具有挑战性的问题,因为该功能需要强调语音中有关情感的信息。频谱特征在语音信号的情感识别中起着至关重要的作用。本文对梅尔频率倒谱系数的提取进行了两种修改,分别是使用幅度谱代替能量谱并排除离散余弦变换,提取梅尔频率幅值系数。梅尔频率幅度系数是非线性梅尔标度频率上幅度谱的对数。梅尔频率幅值系数和三个常规频谱特征梅尔频率倒谱系数 对数频率功率系数和线性预测倒谱系数在Berlin,Ravdess,Savee,EMOVO,eNTERFACE和Urdu数据库中进行了测试,并使用多类支持向量机作为分类器。梅尔频率幅值系数作为独立功能可以识别情绪,其中柏林的准确度为81.50%,Ravdess的准确度为64.31%,Savee的准确度为75.63%,EMOVO的准确度为73.30%,eNTERFACE的准确度为56.41%,Urdu数据库的准确度为95.25%。与常规特征相比,发现梅尔频率幅度系数是用于从语音识别情绪的更好的频谱特征。梅尔频率幅值系数作为独立功能可以识别情绪,其中柏林的准确度为81.50%,Ravdess的准确度为64.31%,Savee的准确度为75.63%,EMOVO的准确度为73.30%,eNTERFACE的准确度为56.41%,Urdu数据库的准确度为95.25%。与常规特征相比,发现梅尔频率幅度系数是用于从语音识别情绪的更好的频谱特征。梅尔频率幅值系数作为独立功能可以识别情绪,其中柏林的准确度为81.50%,Ravdess的准确度为64.31%,Savee的准确度为75.63%,EMOVO的准确度为73.30%,eNTERFACE的准确度为56.41%,Urdu数据库的准确度为95.25%。与常规特征相比,发现梅尔频率幅度系数是用于从语音识别情绪的更好的频谱特征。

更新日期:2021-03-27
down
wechat
bug