当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
arXiv - CS - Computation and Language Pub Date : 2020-11-23 , DOI: arxiv-2011.11588
Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a composite baseline made of the concatenation of three unsupervised systems: self-supervised contrastive representation learning (CPC), clustering (k-means) and language modeling (LSTM or BERT). The language models learn on the basis of the pseudo-text derived from clustering the learned representations. This simple pipeline shows better than chance performance on all four metrics, demonstrating the feasibility of spoken language modeling from raw speech. It also yields worse performance compared to text-based 'topline' systems trained on the same data, delineating the space to be explored by more sophisticated end-to-end models.

中文翻译:

零资源语音基准测试2021年:无监督口语建模的指标和基准

我们引入了一项新的无监督任务,即口头语言建模:从没有任何标签的原始音频信号中学习语言表示,以及零资源语音基准2021:一套4个黑匣子,零镜头指标,用于探究语音质量。在四个语言水平上学习的模型:语音,词典,语法和语义。我们介绍了由三个无监督系统串联而成的复合基线的结果和分析:自我监督的对比表示学习(CPC),聚类(k-means)和语言建模(LSTM或BERT)。语言模型基于从聚类的学习表示派生的伪文本进行学习。这个简单的管道显示出在所有四个指标上的胜于偶然的性能,证明了从原始语音进行口语建模的可行性。
更新日期:2020-11-25
down
wechat
bug