当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
Electronics ( IF 2.9 ) Pub Date : 2020-10-20 , DOI: 10.3390/electronics9101725
Gintautas Tamulevičius , Gražina Korvel , Anil Bora Yayak , Povilas Treigys , Jolita Bernatavičienė , Bożena Kostek

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation only. The assumption is that the speech audio signal carries sufficient emotional information to detect and retrieve it. Several two-dimensional acoustic feature spaces, such as cochleagrams, spectrograms, mel-cepstrograms, and fractal dimension-based space, are employed as the representations of speech emotional features. A convolutional neural network (CNN) is used as a classifier. The results show the superiority of cochleagrams over other feature spaces utilized. In the CNN-based speaker-independent cross-linguistic speech emotion recognition (SER) experiment, the accuracy of over 90% is achieved, which is close to the monolingual case of SER.

中文翻译:

基于二维特征空间的跨语言语音情感识别研究

在这项研究中,进行了跨语言语音情感识别的研究。为此,收集了不同语言(英语,立陶宛语,德语,西班牙语,塞尔维亚语和波兰语)的情感数据,从而形成了一种具有超过10.000种情感话语的跨语言语音情感数据集。尽管收集的数据库具有双峰特征,但我们的重点仅在于声学表示。假定语音音频信号承载了足够的情感信息以对其进行检测和检索。几个二维声学特征空间,例如耳蜗图,频谱图,mel cetstrograms和基于分形维数的空间,被用作语音情感特征的表示。卷积神经网络(CNN)用作分类器。结果表明,耳蜗图优于所利用的其他特征空间。在基于CNN的独立于说话者的跨语言语音情感识别(SER)实验中,达到了90%以上的准确度,这与SER的单语言情况非常接近。
更新日期:2020-10-20
down
wechat
bug