Data Augmentation using virtual microphone array synthesis and multi-resolution feature extraction for isolated word dysarthric speech recognition,IEEE Journal of Selected Topics in Signal Processing

当前位置： X-MOL 学术 › IEEE J. Sel. Top. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data Augmentation using virtual microphone array synthesis and multi-resolution feature extraction for isolated word dysarthric speech recognition
IEEE Journal of Selected Topics in Signal Processing ( IF 8.7 ) Pub Date : 2020-01-01 , DOI: 10.1109/jstsp.2020.2972161
Mariya Celin T A , Nagarajan Thangavelu , Vijayalakshmi P

Dysarthria is a speech-motor disorder that affects the articulatory systems inhibiting their speech communication efforts. To handle their communication problems, a speech recognition-based augmentative and alternative communication aid is used as an attractive alternative. However, successful development of an automatic speech recognition (ASR)-based aid depends on the availability of sufficient speech data for training. Building an ASR system for dysarthric speakers is difficult due to limited amount of training data and large inter-and-intra speaker variabilities. Using normal speaker's speech data for data augmentation or adaptation for low intelligible dysarthric speakers would be extremely challenging due to huge variation in acoustic characteristics between these two category of speakers. In the current article, a two-level data augmentation is performed on dysarthric speech based on virtual linear microphone array-based synthesis followed by multi-resolution feature extraction. With the augmented speech data, an isolated word hybrid DNN-HMM-based ASR system is trained using UA speech corpus and Tamil dysarthric speech corpus developed by the authors. Performance of the ASR system shows a reduced WER of up to 32.79%, 35.75% for low and very low intelligible speakers with dysarthria compared to recent works on data augmentation reported for dysarthric speech recognition.

中文翻译：

使用虚拟麦克风阵列合成和多分辨率特征提取的数据增强用于孤立词构音障碍语音识别

构音障碍是一种言语运动障碍，会影响发音系统，从而抑制他们的言语交流工作。为了解决他们的沟通问题，基于语音识别的增强和替代沟通辅助工具被用作有吸引力的替代方案。然而，基于自动语音识别 (ASR) 的辅助工具的成功开发取决于是否有足够的语音数据进行训练。由于训练数据量有限且说话人之间和说话人内的差异很大，因此为构音障碍说话人构建 ASR 系统很困难。由于这两类说话者之间的声学特性存在巨大差异，因此使用正常说话者的语音数据进行数据增强或适应低清晰度构音障碍说话者将极具挑战性。在当前的文章中，基于基于虚拟线性麦克风阵列的合成对构音障碍语音执行两级数据增强，然后是多分辨率特征提取。借助增强的语音数据，使用作者开发的 UA 语音语料库和泰米尔语构音障碍语音语料库训练基于孤立词混合 DNN-HMM 的 ASR 系统。与最近为构音障碍语音识别报告的数据增强工作相比，ASR 系统的性能显示，对于具有构音障碍的低和极低可理解的说话者，WER 降低了 32.79%，35.75%。使用作者开发的 UA 语音语料库和泰米尔语构音障碍语音语料库训练基于孤立词混合 DNN-HMM 的 ASR 系统。与最近为构音障碍语音识别报告的数据增强工作相比，ASR 系统的性能显示，对于具有构音障碍的低和极低可理解的说话者，WER 降低了 32.79%，35.75%。使用作者开发的 UA 语音语料库和泰米尔语构音障碍语音语料库训练基于孤立词混合 DNN-HMM 的 ASR 系统。与最近报道的构音障碍语音识别数据增强工作相比，ASR 系统的性能显示，对于具有构音障碍的低和极低可理解的说话者，WER 降低了 32.79%，35.75%。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11