当前位置: X-MOL 学术IEEE Trans. Affect. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis and Classification of Cold Speech using Variational Mode Decomposition
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2020-04-01 , DOI: 10.1109/taffc.2017.2761750
Suman Deb , Samarendra Dandapat , Jarek Krajewski

This paper presents analysis and classification of a pathological speech called cold speech, which is recorded when the person is suffering from common cold. Nose and throat are affected by the common cold. As nose and throat play an important role in speech production, the speech characteristics are altered during this pathology. In this work, variational mode decomposition (VMD) is used for analysis and classification of cold speech. VMD decomposes the speech signal into a number of sub-signals or modes. These sub-signals may better exploit the pathological information for characterization of cold speech. Various statistics, mean, variance, kurtosis and skewness are extracted from each of the decomposed sub-signals. Along with those statistics, center frequency, energy, peak amplitude, spectral entropy, permutation entropy and Renyi's entropy are evaluated, and used as features. Mutual information (MI) is further employed to assign the weight values to the features. In terms of classification rates, the proposed feature outperforms the linear prediction coefficients (LPC), mel frequency cepstral coefficients (MFCC), Teager energy operator (TEO) based feature and ComParE feature sets (IS09-emotion and IS13-ComParE). The proposed feature shows an average recognition rate of 90.02 percent for IITG cold speech database and 66.84 percent for URTIC database.

中文翻译:

基于变分模式分解的冷语语音分析与分类

本文介绍了一种称为冷语的病理性语音的分析和分类,该语音是在人患有普通感冒时记录的。鼻子和喉咙受到普通感冒的影响。由于鼻子和喉咙在言语产生中起着重要作用,因此在这种病理过程中言语特征会发生改变。在这项工作中,变分模式分解(VMD)被用于冷语音的分析和分类。VMD 将语音信号分解为多个子信号或模式。这些子信号可以更好地利用病理信息来表征冷语。从每个分解的子信号中提取各种统计数据、均值、方差、峰态和偏度。连同这些统计数据,中心频率、能量、峰值幅度、谱熵、置换熵和 Renyi' s 熵被评估,并用作特征。进一步采用互信息 (MI) 为特征分配权重值。在分类率方面,所提出的特征优于线性预测系数(LPC)、梅尔频率倒谱系数(MFCC)、基于Teager能量算子(TEO)的特征和ComParE特征集(IS09-emotion和IS13-ComParE)。提出的特征显示,IITG 冷语音数据库的平均识别率为 90.02%,URTIC 数据库的平均识别率为 66.84%。基于 Teager 能量算子 (TEO) 的特征和 CompParE 特征集(IS09-emotion 和 IS13-CompParE)。所提出的特征显示,IITG 冷语音数据库的平均识别率为 90.02%,URTIC 数据库的平均识别率为 66.84%。基于 Teager 能量算子 (TEO) 的特征和 CompParE 特征集(IS09-emotion 和 IS13-CompParE)。所提出的特征显示,IITG 冷语音数据库的平均识别率为 90.02%,URTIC 数据库的平均识别率为 66.84%。
更新日期:2020-04-01
down
wechat
bug