Speech Synthesis as Augmentation for Low-Resource ASR,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speech Synthesis as Augmentation for Low-Resource ASR
arXiv - CS - Sound Pub Date : 2020-12-23 , DOI: arxiv-2012.13004
Deblin Bagchi, Shannon Wotherspoon, Zhuolin Jiang, Prasanna Muthukumar

Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibility of using synthesized speech as a form of data augmentation to lower the resources necessary to build a speech recognizer. We experiment with three different kinds of synthesizers: statistical parametric, neural, and adversarial. Our findings are interesting and point to new research directions for the future.

中文翻译：

语音合成作为低资源ASR的增强

语音合成可能是低资源语音识别的关键。数据增强技术已成为现代语音识别培训的重要组成部分。但是，它们简单，幼稚，很少反映现实情况。同时，语音合成技术已经迅速接近实现类人语音的目标。在本文中，我们研究了使用合成语音作为数据增强形式来降低构建语音识别器所需资源的可能性。我们尝试了三种不同类型的合成器：统计参数，神经和对抗性。我们的发现很有趣，并指出了未来的新研究方向。

更新日期：2020-12-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文