What they do when in doubt: a study of inductive biases in seq2seq learners,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

What they do when in doubt: a study of inductive biases in seq2seq learners
arXiv - CS - Computation and Language Pub Date : 2020-06-26 , DOI: arxiv-2006.14953
Eugene Kharitonov and Rahma Chaabouni

Sequence-to-sequence (seq2seq) learners are widely used, but we still have only limited knowledge about what inductive biases shape the way they generalize. We address that by investigating how popular seq2seq learners generalize in tasks that have high ambiguity in the training data. We use SCAN and three new tasks to study learners' preferences for memorization, arithmetic, hierarchical, and compositional reasoning. Further, we connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases. In our experimental study, we find that LSTM-based learners can learn to perform counting, addition, and multiplication by a constant from a single training example. Furthermore, Transformer and LSTM-based learners show a bias toward the hierarchical induction over the linear one, while CNN-based learners prefer the opposite. On the SCAN dataset, we find that CNN-based, and, to a lesser degree, Transformer- and LSTM-based learners have a preference for compositional generalization over memorization. Finally, across all our experiments, description length proved to be a sensitive measure of inductive biases.

中文翻译：

有疑问时他们会做什么：seq2seq 学习器中归纳偏差的研究

序列到序列 (seq2seq) 学习器被广泛使用，但我们对归纳偏差影响它们概括的方式的知识仍然有限。我们通过调查流行的 seq2seq 学习器如何概括训练数据中具有高度歧义的任务来解决这个问题。我们使用 SCAN 和三个新任务来研究学习者对记忆、算术、层次和组合推理的偏好。此外，我们连接到 Solomonoff 的归纳理论，并建议使用描述长度作为归纳偏差的原则性和敏感度量。在我们的实验研究中，我们发现基于 LSTM 的学习者可以从单个训练示例中学习通过常数执行计数、加法和乘法。此外，基于 Transformer 和 LSTM 的学习者表现出对线性层次归纳的偏向，而基于 CNN 的学习者则相反。在 SCAN 数据集上，我们发现基于 CNN 的学习者以及在较小程度上基于 Transformer 和 LSTM 的学习者更喜欢组合泛化而不是记忆。最后，在我们所有的实验中，描述长度被证明是归纳偏差的敏感度量。

更新日期：2020-06-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>