当前位置: X-MOL 学术Comput. Linguist. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic Drift in Multilingual Representations
Computational Linguistics ( IF 3.7 ) Pub Date : 2020-11-01 , DOI: 10.1162/coli_a_00382
Lisa Beinborn 1 , Rochelle Choenni 2
Affiliation  

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations that are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: They can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages.

中文翻译:

多语言表示中的语义漂移

多语言表示主要根据其在特定任务上的表现进行评估。在本文中,我们超越了工程目标,分析了计算表示中语言之间的关系。我们介绍了一种基于语义概念组织来比较语言的方法。我们建议对计算多语言表示中的一组选定概念进行表示相似性分析的改编版本。使用这种分析方法,我们可以重建一个与语言专家假设的非常相似的系统发育树。这些结果表明,仅在单语文本和双语词典上训练的多语言分布表示可以保留语言之间的关系,而无需任何词源信息。此外,我们提出了一种方法来识别语言家族之间的语义漂移。我们在基于单词和基于句子的多语言模型上进行实验,并提供定量结果和定性示例。多语言表示中的语义漂移分析可以用于两个目的:它们可以指示计算模型的不需要的特征,并且它们提供了研究跨语言的语言现象的定量方法。
更新日期:2020-11-01
down
wechat
bug