当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
Speech Communication ( IF 3.2 ) Pub Date : 2020-11-19 , DOI: 10.1016/j.specom.2020.11.003
Bagus Tris Atmaja , Masato Akagi

Automatic speech emotion recognition (SER) by a computer is a critical component for more natural human-machine interaction. As in human-human interaction, the capability to perceive emotion correctly is essential to taking further steps in a particular situation. One issue in SER is whether it is necessary to combine acoustic features with other data such as facial expressions, text, and motion capture. This research proposes to combine acoustic and text information by applying a late-fusion approach consisting of two steps. First, acoustic and text features are trained separately in deep learning systems. Second, the prediction results from the deep learning systems are fed into a support vector machine (SVM) to predict the final regression score. Furthermore, the task in this research is dimensional emotion modeling, because it can enable deeper analysis of affective states. Experimental results show that this two-stage, late-fusion approach, obtains higher performance than that of any one-stage processing, with a linear correlation from one-stage to two-stage processing. This late-fusion approach improves previous early fusion result measured in concordance correlation coefficients score.



中文翻译:

通过使用SVM融合声学和文本网络的预测来进行二维情感识别

计算机自动语音情感识别(SER)是更自然的人机交互的关键组件。与人与人之间的互动一样,正确感知情感的能力对于在特定情况下采取进一步措施至关重要。SER中的一个问题是是否有必要将声学功能与其他数据(如面部表情,文本和运动捕捉)结合起来。这项研究建议通过应用包括两个步骤的后期融合方法来结合声音和文本信息。首先,在深度学习系统中分别对声音和文本特征进行训练。其次,来自深度学习系统的预测结果被输入到支持向量机(SVM)中以预测最终的回归分数。此外,这项研究的任务是维度情感建模,因为它可以使您更深入地分析情感状态。实验结果表明,这种两阶段,后期融合方法比任何一个阶段的处理都具有更高的性能,并且从一阶段到两阶段的处理具有线性相关性。这种后期融合方法改善了先前在一致性相关系数得分中测得的早期融合结果。

更新日期:2020-12-01
down
wechat
bug