当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training
arXiv - CS - Sound Pub Date : 2020-01-10 , DOI: arxiv-2001.04260
Seung Hee Yang, Minhwa Chung

Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were recorded for the purpose of automatic recognition of voice keyboard in a previous study, the generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech. Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4%. It demonstrates that the proposed GAN-based conversion method is useful for improving dysarthric speech intelligibility.

中文翻译:

使用循环一致对抗训练提高构音障碍语音清晰度

构音障碍是一种影响数百万人的运动性言语障碍。与非构音障碍说话者相比,构音障碍言语的可理解性要差得多,从而导致严重的沟通困难。我们工作的目标是使用 Cycle-consistent GAN 开发一个构音障碍到健康语音转换的模型。使用先前研究中为了自动识别语音键盘而记录的 18,700 个构音障碍和 8,610 个健康对照韩语话语,训练生成器将构音障碍转换为频谱域中的健康语音,然后再转换回语音。在保留的测试集上使用自动语音识别对生成的话语进行的客观评估表明,在进行对抗性训练后,识别性能与原始构音障碍语音相比有所提高,因为绝对 WER 降低了 33.4%。它表明所提出的基于 GAN 的转换方法可用于改善构音障碍语音清晰度。
更新日期:2020-01-14
down
wechat
bug