当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-06-02 , DOI: 10.1016/j.csl.2020.101117
N.P. Narendra , Paavo Alku

In clinical practice, assessment of intelligibility in speakers with dysarthria is performed by speech-language pathologists through auditory perceptual tests which demand patients’ presence at hospital and involve time-consuming examinations. Frequent clinical monitoring can be costly and logistically inconvenient both for patients and medical experts. Here, we aim to automate the procedure of assessment of intelligibility in dysarthric speakers with an objective, speech-based method that can be employed in a telescreening application. The proposed method predicts the level of intelligibility in dysarthric speakers using four levels of speech intelligibility (very low, low, mediocre and high). The study compares several automatic methods to assess the intelligibility level in speakers with dysarthria by utilizing information generated at the level of the vocal folds through glottal features and by using coded telephone speech (i.e. speech that is used in telescreening applications). In addition to the glottal features, the openSMILE features are used as acoustic baseline features. Using features obtained from coded speech utterances and the corresponding intelligibility level labels, multiclass-support vector machine (SVM) classifiers are trained. A separate set of multiclass-SVMs are trained using both individual glottal and acoustic features as well as their combinations. Coded telephone speech is generated with the adaptive multi-rate codec with two operational bandwidths (narrowband and wideband), from utterances of an open database of dysarthric speech (Universal Access-Speech). Experimental results showed good classification accuracies for the glottal features, indicating their effectiveness in the intelligibility level assessment in speakers with dysarthria even in the challenging coded condition. Improvement in classification accuracy was obtained when the glottal features were combined with the openSMILE acoustic features, which validate the complimentary nature of the glottal features.



中文翻译:

使用声门功能从编码电话语音中自动评估构音障碍说话者的清晰度

在临床实践中,言语异常的言语病理学家通过听觉知觉测验来评估构音障碍者的清晰度,这需要患者在医院就诊并进行费时的检查。对于患者和医学专家而言,频繁的临床监测可能既昂贵又在物流上不方便。在这里,我们旨在通过一种客观的,基于语音的方法来自动执行评估发音异常的人的清晰度的程序,该方法可用于电视筛选应用中。所提出的方法使用四个语音清晰度级别(非常低,低,中等和较高)来预测构音障碍者的清晰度。这项研究比较了几种自动方法,通过利用声门特征在声带水平产生的信息以及使用编码电话语音(即用于远程筛选应用中的语音)来评估构音障碍者的清晰度。除了声门功能外,openSMILE功能还用作声学基线功能。使用从编码语音发音和相应的清晰度级别标签中获得的功能,可以训练多类支持向量机(SVM)分类器。使用单独的声门和声学特征及其组合来训练一组单独的多类SVM。使用具有两个工作带宽(窄带和宽带)的自适应多速率编解码器生成编码的电话语音,从发音异常语音开放数据库(Universal Access-Speech)的语音中摘录。实验结果表明,对于声门特征具有良好的分类精度,表明即使在困难的编码条件下,它们也可有效地评估构音障碍者的清晰度。当声门特征与openSMILE声学特征相结合时,分类准确性得到了改善,这证实了声门特征的互补性。

更新日期:2020-06-02
down
wechat
bug