It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-15 , DOI: arxiv-2009.07118
Timo Schick, Hinrich Sch\"utze

When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements. Based on our findings, we identify several key factors required for successful natural language understanding with small language models.

中文翻译：

重要的不仅仅是大小：小型语言模型也是少数学习者

当扩展到数千亿个参数时，诸如 GPT-3（Brown 等人，2020 年）之类的预训练语言模型在具有挑战性的自然语言理解基准测试中实现了非凡的小样本性能。在这项工作中，我们表明可以使用参数计数小几个数量级的语言模型获得类似于 GPT-3 的性能。这是通过将文本输入转换为包含某种形式的任务描述的完形填空题，并结合基于梯度的优化来实现的；另外利用未标记的数据提供了进一步的改进。根据我们的发现，我们确定了使用小语言模型成功理解自然语言所需的几个关键因素。

更新日期：2020-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文