当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Interpretable and Transferable Speech Emotion Recognition: Latent Representation Based Analysis of Features, Methods and Corpora
arXiv - CS - Sound Pub Date : 2021-05-05 , DOI: arxiv-2105.02055
Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line H. Clemmensen

In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques. However, generalizing over languages, corpora and recording conditions is still an open challenge in the field. Furthermore, due to the black-box nature of deep learning algorithms, a newer challenge is the lack of interpretation and transparency in the models and the decision making process. This is critical when the SER systems are deployed in applications that influence human lives. In this work we address this gap by providing an in-depth analysis of the decision making process of the proposed SER system. Towards that end, we present low-complexity SER based on undercomplete- and denoising- autoencoders that achieve an average classification accuracy of over 55\% for four-class emotion classification. Following this, we investigate the clustering of emotions in the latent space to understand the influence of the corpora on the model behavior and to obtain a physical interpretation of the latent embedding. Lastly, we explore the role of each input feature towards the performance of the SER.

中文翻译:

迈向可解释和可转移的语音情感识别:基于潜在表示的特征,方法和语料库分析

近年来,语音情感识别(SER)已用于从医疗保健到商业领域的广泛应用。除了信号处理方法外,SER的方法现在还使用深度学习技术。但是,对语言,语料库和记录条件进行概括仍然是该领域的公开挑战。此外,由于深度学习算法的黑盒性质,一个新的挑战是模型和决策过程缺乏解释性和透明性。当将SER系统部署在影响人类生活的应用程序中时,这一点至关重要。在这项工作中,我们通过对所提出的SER系统的决策过程进行深入分析来解决这一差距。为此,我们提出了基于不完全和降噪自动编码器的低复杂度SER,对于四类情感分类,该编码器的平均分类精度达到55%以上。在此之后,我们调查了潜在空间中的情感聚类,以了解语料库对模型行为的影响并获得对潜在嵌入的物理解释。最后,我们探讨了每个输入功能对SER性能的作用。
更新日期:2021-05-06
down
wechat
bug