当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2020-06-20 , DOI: 10.1007/s11265-020-01538-x
Zhao Huijuan , Ye Ning , Wang Ruchuan

Speech emotion recognition is very challenging because the definition of emotion is uncertain and the feature representation is complex. Accurate feature representation is one of the key factors for successful speech emotion recognition. Studies have shown that 3D data composed of static, deltas and delta-deltas of log-Mel spectrum is very effective in filtering irrelevant features. The challenge of speech emotion recognition is also reflected in the necessity of fine-grained classification. Typical applications of affective computing, such as psychological counseling and emotion regulation, require fine-grained emotion recognition. Based on the two inspirations, this paper proposes an end-to-end hierarchical multi-task learning framework, from coarse to fine to achieve fine-grained emotion recognition. Using 3D data as input, in the first stage, we train the coarse emotion type, and then use the result to assist the second stage training for the fine emotion type. By conducting the comparative experiments on the IEMOCAP corpus, we find that the classification idea of coarse-to-fine has a significant performance improvement over the baseline models.



中文翻译:

基于多任务学习的粗到细语音情感识别

语音情感识别非常具有挑战性,因为情感的定义不确定并且特征表示很复杂。准确的特征表示是成功语音情感识别的关键因素之一。研究表明,由log-Mel光谱的静态,德尔塔和德尔塔-德尔塔组成的3D数据在过滤无关特征方面非常有效。语音情感识别的挑战也反映在细粒度分类的必要性上。情感计算的典型应用,例如心理咨询和情绪调节,需要细粒度的情绪识别。基于这两个启示,本文提出了一种从粗到细的端到端的分层多任务学习框架,以实现细粒度的情感识别。在第一阶段,使用3D数据作为输入,我们训练粗略的情绪类型,然后使用结果协助进行第二阶段的细微情绪类型训练。通过对IEMOCAP语料库进行比较实验,我们发现从粗到精的分类思想比基线模型具有显着的性能改进。

更新日期:2020-06-23
down
wechat
bug