当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised Automatic Speech Recognition: A Review
arXiv - CS - Computation and Language Pub Date : 2021-06-09 , DOI: arxiv-2106.04897
Hanan Aldarmaki, Asad Ullah, Nazar Zaki

Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest. In this paper, we review the research literature to identify models and ideas that could lead to fully unsupervised ASR, including unsupervised segmentation of the speech signal, unsupervised mapping from speech segments to text, and semi-supervised models with nominal amounts of labeled examples. The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition. Identifying these limitations would help optimize the resources and efforts in ASR development for low-resource languages.

中文翻译:

无监督自动语音识别:综述

可以训练自动语音识别 (ASR) 系统以在大量手动转录语音的情况下实现卓越的性能,但是对于所有感兴趣的语言,获取大型标记数据集可能很困难或成本很高。在本文中,我们回顾了研究文献,以确定可能导致完全无监督 ASR 的模型和想法,包括语音信号的无监督分割、从语音片段到文本的无监督映射,以及具有标称数量标记示例的半监督模型。该研究的目的是确定仅从语音数据中可以学习到的内容的局限性,并了解语音识别的最低要求。确定这些限制将有助于优化低资源语言 ASR 开发的资源和工作。
更新日期:2021-06-10
down
wechat
bug