当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Wavelet sub-band features for voice disorder detection and classification
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2020-08-04 , DOI: 10.1007/s11042-020-09424-1
Girish Gidaye , Jagannath Nirmal , Kadria Ezzine , Mondher Frikha

Acoustic analysis of the speech signal enables non-intrusive, affordable, unbiased and fast assessment of voice pathologies. This assessment provides complimentary information to otolaryngologist for preliminary diagnosis of pathological larynx. Several voice impairment assessment systems focused on acoustic analysis have been introduced in recent years. Nevertheless, these systems are tested using only one or two datasets and are not independent of database and human bias. In this paper, a unified wavelet based framework is suggested for evaluating voice disorders, which is independent of database and human bias. Stationary wavelet transform (SWT) is used to decompose the speech signal, since it offers good time and frequency localization. Energy and statistical features are extracted from each sub-band after multilevel decomposition. Higher the decomposition level, higher is the order of feature vector. To decrease the dimension of the feature vector, information gain (IG) based feature selection technique is harnessed for selecting most relevant and discarding redundant features. The enriched feature vector is assessed using support vector machine (SVM), stochastic gradient descent (SGD) and artificial neural network (ANN) classifiers. Records of vowel /a/, vocalized at natural pitch for both healthy and pathological subjects, are mined from German, English, Arabic and Spanish speech databases. During the first phase of experiments, input speech signal is detected as healthy or pathological. Second phase classifies input speech samples into healthy, cyst, paralysis or polyp. Experimental results demonstrate that, the extracted energy and statistical features can be used as possible clues for voice disorder evaluation. The most important aspect of the proposed method is that the features are independent of the fundamental frequency. The detection and classification rates attained are comparable to other state-of-the-art approaches.



中文翻译:

小波子带特征,用于语音障碍检测和分类

语音信号的声学分析可以对语音病理进行非侵入性,可承受,无偏且快速的评估。该评估为耳鼻喉科医生提供了有关病理喉的初步诊断的补充信息。近年来已经引入了几种专注于声学分析的语音损伤评估系统。但是,这些系统仅使用一个或两个数据集进行了测试,并不独立于数据库和人为偏见。本文提出了一种基于小波的统一框架来评估语音障碍,该框架独立于数据库和人为偏见。固定小波变换(SWT)用于分解语音信号,因为它提供了良好的时间和频率定位。多级分解后,从每个子带中提取能量和统计特征。分解级别越高,特征向量的阶数越高。为了减小特征向量的维数,利用基于信息增益(IG)的特征选择技术来选择最相关的特征并丢弃冗余特征。使用支持向量机(SVM),随机梯度下降(SGD)和人工神经网络(ANN)分类器评估丰富的特征向量。从德语,英语,阿拉伯语和西班牙语语音数据库中提取元音/ a /的记录,以自然的音调发声,以记录健康和病理学主题。在实验的第一阶段,输入的语音信号被检测为健康或病理。第二阶段将输入语音样本分类为健康,囊肿,瘫痪或息肉。实验结果表明,提取的能量和统计特征可以用作语音障碍评估的可能线索。所提出方法的最重要方面是特征独立于基频。达到的检测率和分类率与其他最新方法相当。

更新日期:2020-08-04
down
wechat
bug