Regularized Training of Nearest Neighbor Language Models,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Regularized Training of Nearest Neighbor Language Models
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08249
Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind

Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon $k$NN-LM \citep{khandelwal20generalization}, which uses a pre-trained language model together with an exhaustive $k$NN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can improve the $k$NN-LM performance by instead training a LM with the knowledge that we will be using a $k$NN post-hoc. We achieved significant improvement using our method on language modeling tasks on \texttt{WIKI-2} and \texttt{WIKI-103}. The main phenomenon that we encounter is that adding a simple L2 regularization on the activations (not weights) of the model, a transformer, improves the post-hoc $k$NN classification performance. We explore some possible reasons for this improvement. In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

中文翻译：

最近邻语言模型的正则化训练

在自然语言处理架构中包含内存库，通过在推理时为其配备额外数据来增加模型容量。在本文中，我们建立在 $k$NN-LM \citep{khandelwal20generalization} 的基础上，它使用预训练的语言模型以及对训练数据（记忆库）的详尽 $k$NN 搜索来实现状态-艺术成果。我们调查是否可以通过使用我们将使用 $k$NN post-hoc 的知识来训练 LM 来提高 $k$NN-LM 的性能。我们使用我们的方法在 \texttt{WIKI-2} 和 \texttt{WIKI-103} 上的语言建模任务上取得了显着的改进。我们遇到的主要现象是在模型的激活（而不是权重）上添加一个简单的 L2 正则化，一个转换器，提高了 post-hoc $k$NN 分类性能。我们探讨了这种改进的一些可能原因。特别是，我们发现添加的 L2 正则化似乎提高了高频词的性能，而不会降低低频词的性能。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文