当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
arXiv - CS - Computation and Language Pub Date : 2020-03-24 , DOI: arxiv-2003.11080
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

中文翻译:

XTREME:用于评估跨语言泛化的大规模多语言多任务基准

机器学习模型应用于 NLP 的最新进展是由评估各种任务的模型的基准驱动的。然而,这些广泛覆盖的基准主要限于英语,尽管人们对多语言模型的兴趣越来越大,但仍然缺少能够对各种语言和任务进行全面评估的基准。为此,我们介绍了多语言编码器的跨语言传输评估 XTREME 基准,这是一个多任务基准,用于评估跨 40 种语言和 9 个任务的多语言表示的跨语言泛化能力。我们证明,虽然在英语上测试的模型在许多任务上达到了人类的表现,但跨语言迁移模型的性能仍然存在相当大的差距,特别是在句法和句子检索任务上。跨语言的结果也很广泛。我们发布基准以鼓励跨语言学习方法的研究,这些方法将语言知识转移到多种具有代表性的语言和任务中。
更新日期:2020-09-07
down
wechat
bug