Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings
arXiv - CS - Computation and Language Pub Date : 2020-03-23 , DOI: arxiv-2003.10224
Christos Xypolopoulos, Antoine J.-P. Tixier, Michalis Vazirgiannis

The number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources. We propose a novel method to estimate polysemy, based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven. We show through rigorous experiments that our rankings are well correlated (with strong statistical significance) with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings. A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word. Finally, the fully unsupervised nature of our method makes it applicable to any language. Code and data are publicly available at https://github.com/ksipos/polysemy-assessment.

中文翻译：

具有上下文嵌入的多分辨率网格的无监督词多义性量化

给定单词的意义数量或多义性是一个非常主观的概念，在注释者和资源之间差异很大。我们提出了一种基于上下文嵌入空间中的简单几何形状来估计多义性的新方法。我们的方法是完全无监督的，纯粹是数据驱动的。我们通过严格的实验表明，对于 6 个不同的标准指标，我们的排名与来自著名的人工构建资源（如 WordNet、OntoNotes、Oxford、Wikipedia 等）的 6 个不同排名密切相关（具有很强的统计意义）。我们还可视化和分析人类排名之间的相关性。我们方法的一个有价值的副产品是能够在没有额外成本的情况下对包含给定单词的不同含义的句子进行采样。最后，我们方法的完全无监督性质使其适用于任何语言。代码和数据可在 https://github.com/ksipos/polysemy-assessment 公开获得。

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文