当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
arXiv - CS - Sound Pub Date : 2020-07-06 , DOI: arxiv-2007.03001
Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages. We perform an extensive benchmark on 51 languages, with varying amount of training data by language(from 100 hours to 1100 hours). We compare three variants of multilingual training from a single joint model without knowing the input language, to using this information, to multiple heads (one per language cluster). We show that multilingual training of ASR models on several languages can improve recognition performance, in particular, on low resource languages. We see 20.9%, 23% and 28.8% average WER relative reduction compared to monolingual baselines on joint model, joint model with language input and multi head model respectively. To our knowledge, this is the first work studying multilingual ASR at massive scale, with more than 50 languages and more than 16,000 hours of audio across them.

中文翻译:

大规模多语言 ASR:50 种语言、1 个模型、10 亿个参数

我们研究为多种语言训练单一声学模型,目的是提高低资源语言的自动语音识别 (ASR) 性能,并全面简化支持多种语言的 ASR 系统的部署。我们对 51 种语言执行了广泛的基准测试,按语言使用不同数量的训练数据(从 100 小时到 1100 小时)。我们比较了多语言训练的三种变体,从不知道输入语言的单个联合模型到使用此信息,再到多个头(每个语言集群一个)。我们表明,对多种语言的 ASR 模型进行多语言训练可以提高识别性能,尤其是在低资源语言上。与联合模型上的单语基线相比,我们看到平均 WER 相对减少 20.9%、23% 和 28.8%,分别带有语言输入的联合模型和多头模型。据我们所知,这是第一项大规模研究多语言 ASR 的工作,涉及 50 多种语言和超过 16,000 小时的音频。
更新日期:2020-07-09
down
wechat
bug