Investigating the Limitations of the Transformers with Simple Arithmetic Tasks,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
arXiv - CS - Computation and Language Pub Date : 2021-02-25 , DOI: arxiv-2102.13019
Rodrigo Nogueira, Zhiying Jiang, Jimmy Li

The ability to perform arithmetic tasks is a remarkable trait of human intelligence and might form a critical component of more complex reasoning tasks. In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values. We find that how a number is represented in its surface form has a strong influence on the model's accuracy. In particular, the model fails to learn addition of five-digit numbers when using subwords (e.g., "32"), and it struggles to learn with character-level representations (e.g., "3 2"). By introducing position tokens (e.g., "3 10e1 2"), the model learns to accurately add and subtract numbers up to 60 digits. We conclude that modern pretrained language models can easily learn arithmetic from very few examples, as long as we use the proper surface representation. This result bolsters evidence that subword tokenizers and positional encodings are components in current transformer designs that might need improvement. Moreover, we show that regardless of the number of parameters and training examples, models cannot learn addition rules that are independent of the length of the numbers seen during training. Code to reproduce our experiments is available at https://github.com/castorini/transformers-arithmetic

中文翻译：

用简单的算术任务研究变压器的局限性

执行算术任务的能力是人类智能的显着特征，并且可能构成更复杂的推理任务的关键组成部分。在这项工作中，我们研究了数的表面形式是否对序列到序列语言模型如何学习简单的算术任务（例如跨较大值的加法和减法）有任何影响。我们发现数字如何以其表面形式表示对模型的准确性有很大的影响。特别地，该模型在使用子词（例如“ 32”）时无法学习五位数的加法，并且难以学习字符级表示形式（例如“ 3 2”）。通过引入位置标记（例如“ 3 10e1 2”），模型学习了精确地加减最多60位数字的数字。我们得出的结论是，只要我们使用适当的表面表示，现代的预训练语言模型就可以从很少的示例中轻松学习算术。该结果提供了证据，表明子字标记器和位置编码是电流互感器设计中可能需要改进的组件。此外，我们表明，不管参数和训练示例的数量如何，模型都无法学习与训练过程中看到的数字的长度无关的加法则。重现我们的实验的代码可在https://github.com/castorini/transformers-arithmetic获得该结果提供了证据，表明子字标记器和位置编码是电流互感器设计中可能需要改进的组件。此外，我们表明，不管参数和训练示例的数量如何，模型都无法学习与训练过程中看到的数字的长度无关的加法则。重现我们的实验的代码可在https://github.com/castorini/transformers-arithmetic获得该结果提供了证据，表明子字标记器和位置编码是电流互感器设计中可能需要改进的组件。此外，我们表明，不管参数和训练示例的数量如何，模型都无法学习与训练过程中看到的数字的长度无关的加法则。重现我们的实验的代码可在https://github.com/castorini/transformers-arithmetic获得

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>