当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech emotion recognition model based on Bi-GRU and Focal Loss
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2020-11-11 , DOI: 10.1016/j.patrec.2020.11.009
Zijiang Zhu , Weihuang Dai , Yi Hu , Junshan Li

For the problems of inconsistent sample duration and unbalance of sample categories in the speech emotion corpus, this paper proposes a speech emotion recognition model based on Bi-GRU (Bidirection Gated Recurrent Unit) and Focal Loss. The model has been improved on the basis of learning CRNN (Convolutional Recurrent Neural Network) deeply. In CRNN, Bi-GRU is used to effectively lengthen the samples of the speech with short duration, and Focal Loss function is used to deal with the difficulties in classification caused by the imbalance of emotional categories of the samples. Through different methods for experimental comparison, weighted average recall (WAR), unweighted average recall (UAR) and confusion matrix (CM) are used as evaluation index of the algorithm. The experimental results show that the speech emotion recognition model proposed in this paper improves the recognition accuracy and the imbalance of IEMOCAP database samples, and can effectively prove that the improvement of speech emotion recognition performance is not due to the adjustment of model parameters or the change of the model topology.



中文翻译:

基于Bi-GRU和焦点损失的语音情感识别模型

针对语音情感语料库中样本持续时间不一致,样本类别不平衡的问题,提出了一种基于Bi-GRU(双向门控递归单元)和焦点损失的语音情感识别模型。在深入学习卷积神经网络的基础上对该模型进行了改进。在CRNN中,Bi-GRU用于有效地延长语音样本的持续时间,而Focal Loss函数用于处理由于样本的情感类别不平衡而导致的分类困难。通过不同的实验比较方法,将加权平均召回率(WAR),未加权平均召回率(UAR)和混淆矩阵(CM)作为算法的评价指标。

更新日期:2020-11-22
down
wechat
bug