Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech
arXiv - CS - Sound Pub Date : 2021-07-22 , DOI: arxiv-2107.10701
Duo Ma, Nana Hou, Van Tung Pham, Haihua Xu, Eng Siong Chng

To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under radio communication condition, we propose a multitask-based method to joint train a Speech Enhancement (SE) module as the front-end and an E2E ASR model as the back-end in this paper. One of the advantage of the proposed method is that the entire system can be trained from scratch. Different from prior works, either component here doesn't need to perform pre-training and fine-tuning processes separately. Through analysis, we found that the success of the proposed method lies in the following aspects. Firstly, multitask learning is essential, that is the SE network is not only learning to produce more Intelligent speech, it is also aimed to generate speech that is beneficial to recognition. Secondly, we also found speech phase preserved from noisy speech is critical for improving ASR performance. Thirdly, we propose a dual channel data augmentation training method to obtain further improvement.Specifically, we combine the clean and enhanced speech to train the whole system. We evaluate the proposed method on the RATS English data set, achieving a relative WER reduction of 4.6% with the joint training method, and up to a relative WER reduction of 11.2% with the proposed data augmentation method.

中文翻译：

用于无线电通信语音的鲁棒 ASR 的基于多任务的联合学习方法

为了在无线电通信条件下实现稳健的端到端自动语音识别（E2E ASR），我们提出了一种基于多任务的方法，以联合训练语音增强（SE）模块作为前端和 E2E ASR 模型作为后端- 在本文结束。所提出方法的优点之一是可以从头开始训练整个系统。与之前的工作不同，这里的任何一个组件都不需要单独执行预训练和微调过程。通过分析，我们发现所提出方法的成功在于以下几个方面。首先，多任务学习是必不可少的，即SE网络不仅要学习产生更多智能语音，还旨在生成有利于识别的语音。第二，我们还发现从嘈杂语音中保留的语音相位对于提高 ASR 性能至关重要。第三，我们提出了一种双通道数据增强训练方法，以获得进一步的改进。具体来说，我们结合清洁和增强语音来训练整个系统。我们在 RATS 英语数据集上评估了所提出的方法，使用联合训练方法实现了 4.6% 的相对 WER 减少，使用建议的数据增强方法实现了 11.2% 的相对 WER 减少。

更新日期：2021-07-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>