当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-10-31 , DOI: 10.1016/j.knosys.2020.106547
Turker Tuncer , Sengul Dogan , U. Rajendra Acharya

Speech emotion recognition is one of the challenging research issues in the knowledge-based system and various methods have been recommended to reach high classification capability. In order to achieve high classification performance in speech emotion recognition, a nonlinear multi-level feature generation model is presented by using cryptographic structure. The novelty of this work is the use of cryptographic structure called shuffle box for feature generation and iterative neighborhood component analysis to select the features. The proposed method has three main stages: (i) multi-level feature generation using Tunable Q wavelet transform (TQWT), (ii) twine shuffle pattern (twine-shuf-pat) for feature generation, and (iii) discriminative features are selected using iterative neighborhood component analysis (INCA) and classified. The TQWT is a multi-level wavelet transformation method used to generate high-level, medium-level, and low-level wavelet coefficients. The proposed twine-shuf-pat technique is used to extract the features from the decomposed wavelet coefficients. INCA feature selector is employed to select the clinically significant features. The performance of the obtained model is validated using four speech emotion public databases (RAVDESS Speech, Emo-DB (Berlin), SAVEE, and EMOVO). Our developed twine-shuf-pat and INCA based method yielded 87.43%, 90.09%, 84.79%, and 79.08% classification accuracies using RAVDESS, Emo-DB (Berlin), SAVEE and EMOVO corpora respectively with 10-fold cross-validation strategy. A mixed database is created from four public speech emotion databases which yielded 80.05% classification accuracy. Our obtained speech emotion model is ready to be tested with huge database and can be used in healthcare applications.



中文翻译:

使用麻线混洗模式和迭代邻域成分分析技术的自动准确语音情感识别系统

语音情感识别是基于知识的系统中具有挑战性的研究问题之一,并且已经推荐了各种方法来实现高分类能力。为了在语音情感识别中达到较高的分类性能,提出了一种利用密码结构的非线性多级特征生成模型。这项工作的新颖之处在于使用了称为混洗盒(shuffle box)的密码结构来生成特征,并通过迭代邻域成分分析来选择特征。所提出的方法有三个主要阶段:(i)使用可调Q小波变换(TQWT)进行多级特征生成,(ii)用于特征生成的麻线混洗模式(twine-shuf-pat),以及(iii)使用迭代邻域分量分析选择判别特征(INCA)并分类。TQWT是一种多级小波变换方法,用于生成高,中,低级小波系数。提出的twine-shuf-pat技术用于从分解后的小波系数中提取特征。INCA特征选择器用于选择具有临床意义的特征。使用以下四个方法验证了所获得模型的性能语音情感公共数据库(RAVDESS语音,Emo-DB(柏林),SAVEE和EMOVO)。我们使用RAVDESS,Emo-DB(Berlin),SAVEE和EMOVO语料库以及10倍交叉验证策略分别开发了基于twin-shuf-pat和INCA的方法,可产生87.43%,90.09%,84.79%和79.08%的分类精度。从四个公共语音情感数据库创建一个混合数据库,分类数据库的分类准确率达80.05%。我们获得的言语情感模型已准备好与庞大的数据库进行测试,并且可以在医疗保健应用中使用。

更新日期:2020-11-05
down
wechat
bug