Multilingual and unsupervised subword modeling for zero-resource languages,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multilingual and unsupervised subword modeling for zero-resource languages
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-04-17 , DOI: 10.1016/j.csl.2020.101098
Enno Hermann , Herman Kamper , Sharon Goldwater

Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused unsupervised learning from target language data only, and has been evaluated only intrinsically. Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task. We find that combining two existing target-language-only methods yields better features than either method alone. Nevertheless, even better results are obtained by extracting target language bottleneck features using a model trained on other languages. Cross-lingual training using just one other language is enough to provide this benefit, but multilingual training helps even more. In addition to these results, which hold across both intrinsic measures and the extrinsic task, we discuss the qualitative differences between the different types of learned features.

中文翻译：

零资源语言的多语言和无监督子词建模

零资源语言的子词建模旨在学习语音音频的低级表示，而无需使用目标语言的转录或其他资源（例如文本语料库或发音词典）。良好的表示形式应能捕获语音内容，并抽象化其他类型的可变性，例如说话者差异和声道噪声。该领域的先前工作主要集中于仅从目标语言数据进行无监督学习，并且仅在本质上进行了评估。在这里，我们直接比较多种方法，包括一些仅使用目标语言语音数据的方法和一些使用其他（非目标）语言的转录语音的方法，并且我们评估使用两种内在方法以及下游无监督的单词分割和聚类任务。我们发现，将两种现有的仅使用目标语言的方法相结合所产生的功能要比单独使用任何一种方法都要好。尽管如此，通过使用在其他语言上训练的模型提取目标语言的瓶颈特征，甚至可以获得更好的结果。仅使用另一种语言的跨语言培训就足以提供此好处，但是多语言培训则有更多帮助。除了涵盖内在测量和外在任务的这些结果之外，我们还讨论了不同类型的学习特征之间的质量差异。仅使用另一种语言的跨语言培训就足以提供此好处，但是多语言培训则有更多帮助。除了涵盖内在测量和外在任务的这些结果之外，我们还讨论了不同类型的学习特征之间的质量差异。仅使用另一种语言的跨语言培训就足以提供此好处，但是多语言培训则有更多帮助。除了涵盖内在测量和外在任务的这些结果之外，我们还讨论了不同类型的学习特征之间的质量差异。

更新日期：2020-04-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文