Measuring Memorization Effect in Word-Level Neural Networks Probing,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Measuring Memorization Effect in Word-Level Neural Networks Probing
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.16082
Rudolf Rosa, Tom\'a\v{s} Musil, David Mare\v{c}ek

Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for individual words, instead of extracting the linguistic abstractions from the representations, thus reporting false positive results. While considerable efforts have been made to minimize the memorization problem, the task of actually measuring the amount of memorization happening in the classifier has been understudied so far. In our work, we propose a simple general method for measuring the memorization effect, based on a symmetric selection of comparable sets of test words seen versus unseen in training. Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate. We exemplify this by showcasing our method on a case study of probing for part of speech in a trained neural machine translation encoder.

中文翻译：

在词级神经网络探测中测量记忆效果

多项研究探讨了在为端到端 NLP 任务训练的神经网络中出现的表征，并检查了哪些词级语言信息可以在表征中编码。在经典探测中，分类器在表征上进行训练以提取目标语言信息。然而，分类器可能只是简单地记住单个单词的语言标签，而不是从表示中提取语言抽象，从而报告误报结果。虽然已经做出了相当大的努力来最小化记忆问题，但到目前为止，实际测量分类器中发生的记忆量的任务还没有得到充分研究。在我们的工作中，我们提出了一种简单的通用方法来测量记忆效果，基于对称选择在训练中看到与未看到的可比较测试词集。我们的方法可用于明确量化探测设置中发生的记忆量，以便可以选择适当的设置，并且可以用可靠性估计来解释探测结果。我们通过在经过训练的神经机器翻译编码器中探测词性的案例研究展示我们的方法来举例说明这一点。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文