Multilingual Probing Tasks for Word Representations,Computational Linguistics

当前位置： X-MOL 学术 › Comput. Linguist. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multilingual Probing Tasks for Word Representations
Computational Linguistics ( IF 3.7 ) Pub Date : 2020-06-01 , DOI: 10.1162/coli_a_00376
Gözde Gül Şahin ₁ , Clara Vania ₂ , Ilia Kuznetsov ₃ , Iryna Gurevych ₃

Affiliation

Despite an ever-growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation that requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the linguistic information encoded by the continuous representations of English text. However, from a typological perspective the morphologically poor English is rather an outlier: The information encoded by the word order and function words in English is often stored on a subword, morphological level in other languages. To address this, we introduce 15 type-level probing tasks such as case marking, possession, word length, morphological tag count, and pseudoword identification for 24 languages. We present a reusable methodology for creation and evaluation of such tests in a multilingual setting, which is challenging because of a lack of resources, lower quality of tools, and differences among languages. We then present experiments on several diverse multilingual word embedding models, in which we relate the probing task performance for a diverse set of languages to a range of five classic NLP tasks: POS-tagging, dependency parsing, semantic role labeling, named entity recognition, and natural language inference. We find that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages. We show that our tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting. We release the probing data sets and the evaluation suite LINSPECTOR with https://github.com/UKPLab/linspector.

中文翻译：

词表示的多语言探索任务

尽管为大量语言引入了越来越多的词表示模型，但仍缺乏标准化技术来深入了解这些模型所捕获的内容。这些见解将帮助社区估计下游任务的性能，并设计更明智的神经架构，同时避免需要大量计算资源的广泛实验，并非所有研究人员都可以访问。NLP 的最新发展是使用简单的分类任务，也称为探测任务，测试单个语言特征，例如词性。现有的研究主要集中在探索由英语文本的连续表示编码的语言信息。然而，从类型学的角度来看，形态学上较差的英语是一个异常值：英语中由词序和功能词编码的信息通常存储在其他语言的子词、形态级别上。为了解决这个问题，我们引入了 15 个类型级别的探测任务，例如 24 种语言的案例标记、占有、字长、形态标签计数和伪词识别。我们提出了一种在多语言环境中创建和评估此类测试的可重用方法，由于缺乏资源、工具质量较低以及语言之间的差异，这具有挑战性。然后，我们展示了几种不同的多语言词嵌入模型的实验，其中我们将不同语言集的探测任务性能与一系列五个经典 NLP 任务相关联：词性标注、依赖解析、语义角色标记、命名实体识别和自然语言推理。我们发现许多探测测试与下游任务具有显着的高度正相关，特别是对于形态丰富的语言。我们表明，我们的测试可用于探索多语言环境中语言线索的词嵌入或黑盒神经模型。我们通过 https://github.com/UKPLab/linspector 发布探测数据集和评估套件 LINSPECTOR。

更新日期：2020-06-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11