当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regularized Training of Nearest Neighbor Language Models
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08249
Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind

Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon $k$NN-LM \citep{khandelwal20generalization}, which uses a pre-trained language model together with an exhaustive $k$NN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can improve the $k$NN-LM performance by instead training a LM with the knowledge that we will be using a $k$NN post-hoc. We achieved significant improvement using our method on language modeling tasks on \texttt{WIKI-2} and \texttt{WIKI-103}. The main phenomenon that we encounter is that adding a simple L2 regularization on the activations (not weights) of the model, a transformer, improves the post-hoc $k$NN classification performance. We explore some possible reasons for this improvement. In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

中文翻译:

最近邻语言模型的正则化训练

在自然语言处理架构中包含内存库,通过在推理时为其配备额外数据来增加模型容量。在本文中,我们建立在 $k$NN-LM \citep{khandelwal20generalization} 的基础上,它使用预训练的语言模型以及对训练数据(记忆库)的详尽 $k$NN 搜索来实现状态-艺术成果。我们调查是否可以通过使用我们将使用 $k$NN post-hoc 的知识来训练 LM 来提高 $k$NN-LM 的性能。我们使用我们的方法在 \texttt{WIKI-2} 和 \texttt{WIKI-103} 上的语言建模任务上取得了显着的改进。我们遇到的主要现象是在模型的激活(而不是权重)上添加一个简单的 L2 正则化,一个转换器,提高了 post-hoc $k$NN 分类性能。我们探讨了这种改进的一些可能原因。特别是,我们发现添加的 L2 正则化似乎提高了高频词的性能,而不会降低低频词的性能。
更新日期:2021-09-20
down
wechat
bug