当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features
Speech Communication ( IF 2.4 ) Pub Date : 2020-05-22 , DOI: 10.1016/j.specom.2020.04.005
Lamiaa Abdel-Hamid

Speech emotion recognition (SER) has recently been receiving increased interest due to the rapid advancements in affective computing and human computer interaction. English, German, Mandarin and Indian are among the most commonly considered languages for SER along with other European and Asian languages. However, few researches have implemented Arabic SER systems due to the scarcity of available Arabic speech emotion databases. Although Egyptian Arabic is considered one of the most widely spoken and understood Arabic dialects in the Middle East, no Egyptian Arabic speech emotion database has yet been devised. In this work, a semi-natural Egyptian Arabic speech emotion (EYASE) database is introduced that has been created from an award winning Egyptian TV series. The EYASE database includes utterances from 3 male and 3 female professional actors considering four emotions: angry, happy, neutral and sad. Prosodic, spectral and wavelet features are computed from the EYASE database for emotion recognition. In addition to the classical pitch, intensity, formants and Mel-frequency cepstral coefficients (MFCC) widely implemented for SER, long-term average spectrum (LTAS) and wavelet parameters are also considered in this work. Speaker independent and speaker dependent experiments were performed for three different cases: (1) emotion vs. neutral classifications, (2) arousal and valence classifications and (3) multi-emotion classifications. Several analysis were made to explore different aspects related to Arabic SER including the effect of gender and culture on SER. Furthermore, feature ranking was performed to evaluate the relevance of the LTAS and wavelet features for SER, in comparison to the more widely used prosodic and spectral features. Moreover, anger detection performance is compared for different combinations of the implemented prosodic, spectral and wavelet features. Feature ranking and anger detection performance analysis showed that both LTAS and wavelet features were relevant for Arabic SER and that they significantly improved emotion recognition rates.



中文翻译:

利用韵律,频谱和小波特征的埃及阿拉伯语语音情感识别

由于情感计算和人机交互的快速发展,语音情感识别(SER)最近受到越来越多的关注。英语,德语,普通话和印度语以及其他欧洲和亚洲语言是SER中最常用的语言。然而,由于缺乏可用的阿拉伯语语音情感数据库,很少有研究实施阿拉伯语SER系统。尽管埃及阿拉伯语被认为是中东最广泛使用和理解的阿拉伯方言之一,但尚未建立埃及阿拉伯语语音情感数据库。在这项工作中,引入了一个半自然的埃及阿拉伯语语音情感(EYASE)数据库,该数据库是根据屡获殊荣的埃及电视连续剧创建的。EYASE数据库包含3位男性和3位女性专业演员的话语,这些话语考虑了四种情绪:愤怒,快乐,中立和悲伤。从EYASE数据库计算韵律,频谱和小波特征,以进行情感识别。除了广泛用于SER的经典音调,强度,共振峰和梅尔频率倒谱系数(MFCC)外,这项工作还考虑了长期平均频谱(LTAS)和小波参数。针对三种不同情况进行了独立于说话人和与说话人无关的实验:(1)情绪vs.中立分类;(2)唤醒和价态分类;(3)多元情绪分类。进行了若干分析以探讨与阿拉伯SER相关的不同方面,包括性别和文化对SER的影响。此外,与更广泛使用的韵律和频谱特征相比,进行了特征排名以评估LTAS和小波特征与SER的相关性。此外,针对实施的韵律,频谱和小波特征的不同组合比较了愤怒检测性能。特征分级和愤怒检测性能分析表明,LTAS和小波特征都与阿拉伯SER相关,并且它们显着提高了情感识别率。

更新日期:2020-05-22
down
wechat
bug