当前位置: X-MOL 学术Intell. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2020-09-30 , DOI: 10.3233/ida-194747
Kudakwashe Zvarevashe , Oludayo O. Olugbara

Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted featuresand machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.

中文翻译:

使用自定义2D卷积神经网络深度学习算法识别语音情感

语音情感识别已成为现代世界中大多数人机交互应用程序的心脏。开发情感智能设备的需求不断增长,为人们提供了许多研究机会。该领域的大多数研究人员已将手工制作的功能和机器学习技术应用于识别语音情感。但是,这些技术需要额外的处理步骤,并且手工制作的功能通常不可靠。它们是计算密集型的,因为维数的诅咒导致较低的辨别力。研究表明,深度学习算法可有效地提取数据集中的稳健和显着特征。在这项研究中,我们开发了一个定制的2D卷积神经网络,该网络既可以进行特征提取又可以对语音进行分类。使用柏林的情感语音数据库,Ryerson视听情感语音数据库和萨里视听表达情感语料库,针对深层感知器神经网络和深层径向基函数神经网络对神经网络进行了评估。与其他现有算法相比,所描述的深度学习算法可实现最高的精度,召回率和F1分数。可以看出,根据应用领域的不同,可能需要针对不同的语言设置开发定制的解决方案。与其他现有算法相比,所描述的深度学习算法可实现最高的精度,召回率和F1分数。可以看出,根据应用领域的不同,可能需要针对不同的语言设置开发定制的解决方案。与其他现有算法相比,所描述的深度学习算法可实现最高的精度,召回率和F1分数。可以看出,根据应用领域的不同,可能需要针对不同的语言设置开发定制的解决方案。
更新日期:2020-10-04
down
wechat
bug