当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Speech Communication ( IF 2.4 ) Pub Date : 2021-02-18 , DOI: 10.1016/j.specom.2021.02.001
Manuel Sam Ribeiro , Joanne Cleland , Aciel Eshky , Korin Richmond , Steve Renals

Speech sound disorders are a common communication impairment in childhood. Because speech disorders can negatively affect the lives and the development of children, clinical intervention is often recommended. To help with diagnosis and treatment, clinicians use instrumented methods such as spectrograms or ultrasound tongue imaging to analyse speech articulations. Analysis with these methods can be laborious for clinicians, therefore there is growing interest in its automation. In this paper, we investigate the contribution of ultrasound tongue imaging for the automatic detection of speech articulation errors. Our systems are trained on typically developing child speech and augmented with a database of adult speech using audio and ultrasound. Evaluation on typically developing speech indicates that pre-training on adult speech and jointly using ultrasound and audio gives the best results with an accuracy of 86.9%. To evaluate on disordered speech, we collect pronunciation scores from experienced speech and language therapists, focusing on cases of velar fronting and gliding of /r/. The scores show good inter-annotator agreement for velar fronting, but not for gliding errors. For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio. The best system correctly detects 86.6% of the errors identified by experienced clinicians. Out of all the segments identified as errors by the best system, 73.2% match errors identified by clinicians. Results on automatic gliding detection are harder to interpret due to poor inter-annotator agreement, but appear promising. Overall findings suggest that automatic detection of speech articulation errors has potential to be integrated into ultrasound intervention software for automatically quantifying progress during speech therapy.



中文翻译:

利用超声舌成像技术自动检测语音发音错误

语音障碍是儿童期常见的沟通障碍。由于言语障碍会负面影响儿童的生活和发育,因此通常建议进行临床干预。为了帮助诊断和治疗,临床医生使用诸如频谱图或超声舌头成像之类的仪器化方法来分析语音清晰度。使用这些方法进行分析对于临床医生可能会很费力,因此对其自动化越来越感兴趣。在本文中,我们研究了超声舌成像在自动检测语音发音错误中的作用。我们的系统经过训练,通常会开发儿童语音,并使用音频和超声波为成人语音数据库进行扩充。对典型语音发展情况的评估表明,对成人语音进行预训练以及联合使用超声和音频可以达到最佳效果,准确度为86.9%。为了评估言语失调,我们从经验丰富的言语和语言治疗师那里收集发音分数,重点是前凸和滑行/ r /的情况。分数显示了很好的标注人之间的一致的朝向,但不是滑行误差。对于自动的车头前向误差检测,联合使用超声和音频可获得最佳结果。最好的系统可以正确检测出由经验丰富的临床医生确定的错误的86.6%。在最佳系统确定为错误的所有细分中,有73.2%与临床医生确定的错误相匹配。由于注释者之间的一致性差,自动滑行检测的结果难以解释,但看起来很有希望。总体发现表明,语音表达错误的自动检测有可能集成到超声干预软件中,以自动量化语音治疗过程中的进展。

更新日期:2021-02-24
down
wechat
bug