Neural Execution Engines: Learning to Execute Subroutines,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neural Execution Engines: Learning to Execute Subroutines
arXiv - CS - Programming Languages Pub Date : 2020-06-15 , DOI: arxiv-2006.08084
Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms. This is evidenced by their inability to generalize to data distributions that are outside of their restricted training sets, namely larger inputs and unseen data. We study these generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees. First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. We demonstrate that this is due to attention weights that lose fidelity with longer sequences, particularly when the input numbers are numerically similar. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. Second, to generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication. This allows the embedding to handle missing data by faithfully interpolating numbers not seen during training.

中文翻译：

神经执行引擎：学习执行子程序

已经做出了重大努力来训练复制算法推理的神经网络，但它们往往无法学习这些算法背后的抽象概念。这可以从他们无法泛化到受限制的训练集之外的数据分布中证明，即更大的输入和看不见的数据。我们在数值子程序级别研究这些泛化问题，这些子程序包含常见算法，例如排序、最短路径和最小生成树。首先，我们观察到基于转换器的序列到序列模型可以学习子程序，例如对数字列表进行排序，但随着列表长度超过训练集中的长度，它们的性能会迅速下降。我们证明这是由于注意力权重在更长的序列中失去保真度，特别是当输入数字在数字上相似时。为了解决这个问题，我们提出了一种学习的条件屏蔽机制，它使模型能够在各种算法上以近乎完美的准确度在远远超出其训练范围的情况下进行强泛化。其次，为了推广到看不见的数据，我们表明，一旦在加法或乘法等下游任务上进行训练，用二进制表示编码数字会导致具有丰富结构的嵌入。这允许嵌入通过忠实地插入训练期间未看到的数字来处理丢失的数据。为了推广到看不见的数据，我们表明，一旦在加法或乘法等下游任务上进行训练，用二进制表示编码数字会导致具有丰富结构的嵌入。这允许嵌入通过忠实地插入训练期间未看到的数字来处理丢失的数据。为了推广到看不见的数据，我们表明，一旦在加法或乘法等下游任务上进行训练，用二进制表示编码数字会导致具有丰富结构的嵌入。这允许嵌入通过忠实地插入训练期间未看到的数字来处理丢失的数据。

更新日期：2020-10-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文