SUPERB: Speech processing Universal PERformance Benchmark,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SUPERB: Speech processing Universal PERformance Benchmark
arXiv - CS - Sound Pub Date : 2021-05-03 , DOI: arxiv-2105.01051
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge this gap, we introduce Speech processing Universal PERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data. Among multiple usages of the shared model, we especially focus on extracting the representation learned from SSL due to its preferable re-usability. We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model. Our results demonstrate that the framework is promising as SSL representations show competitive generalizability and accessibility across SUPERB tasks. We release SUPERB as a challenge with a leaderboard and a benchmark toolkit to fuel the research in representation learning and general speech processing.

中文翻译：

上等：语音处理通用性能基准

事实证明，自我监督学习（SSL）对于推进自然语言处理（NLP）和计算机视觉（CV）的研究至关重要。该范例在大量未标记数据上预训练了共享模型，并以最小的适应性实现了各种任务的最新技术（SOTA）。但是，语音处理社区缺乏类似的设置来系统地探索范例。为了弥合这一差距，我们引入了语音处理通用性能基准（SUPERB）。SUPERB是排行榜的佼佼者，它以最小的体系结构更改和标记数据对各种语音处理任务中的共享模型的性能进行了基准测试。在共享模型的多种用法中，由于其较好的可重用性，我们尤其着重于提取从SSL中学到的表示形式。我们提出了一个简单的框架，通过在冻结的共享模型之上学习任务专用的轻量级预测头来解决SUPERB任务。我们的结果表明，该框架很有希望，因为SSL表示形式可以跨SUPERB任务显示出具有竞争力的可推广性和可访问性。我们以排行榜和基准工具包的形式发布SUPERB，以应对挑战，以推动代表性学习和通用语音处理方面的研究。

更新日期：2021-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>