GLGE: A New General Language Generation Evaluation Benchmark,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GLGE: A New General Language Generation Evaluation Benchmark
arXiv - CS - Computation and Language Pub Date : 2020-11-24 , DOI: arxiv-2011.11928
Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet\footnote{The source code and dataset will be publicly available at https://github.com/microsoft/glge.

中文翻译：

GLGE：新的通用语言生成评估基准

诸如GLUE和SuperGLUE之类的多任务基准已经推动了自然语言处理（NLP）的预训练和转移学习的巨大进步。这些基准测试主要侧重于一系列自然语言理解（NLU）任务，而没有考虑自然语言生成（NLG）模型。在本文中，我们介绍了通用语言生成评估（GLGE），这是一个新的多任务基准，用于评估八种语言生成任务中NLG模型的泛化能力。对于每个任务，我们将继续按照任务难度设计三个子任务（GLGE-Easy，GLGE-Medium和GLGE-Hard）。这引入了24个子任务来全面比较模型性能。为了鼓励对NLG模型进行预训练和转移学习的研究，

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文