当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2021-02-27 , DOI: 10.1016/j.knosys.2021.106902
Eniafe Festus Ayetiran , Petr Sojka , Vít Novotný

Several language applications often require word semantics as a core part of their processing pipeline either as precise meaning inference or semantic similarity. Multi-sense embeddings (m-se) can be exploited for this important requirement. m-se seeks to represent each word by their distinct senses in order to resolve the conflation of meanings of words as used in different contexts. Previous works usually approach this task by training a model on a large corpus and often ignore the effect and usefulness of the semantic relations offered by lexical resources. However, even with large training data, coverage of all possible word senses is still an issue. In addition, a considerable percentage of contextual semantic knowledge are never learned because a huge amount of possible distributional semantic structures are never explored. In this paper, we leverage the rich semantic structures in WordNet using a graph-theoretic walk technique over word senses to enhance the quality of multi-sense embeddings. This algorithm composes enriched texts from the original texts. Furthermore, we derive new distributional semantic similarity measures for m-se from prior ones. We adapt these measures to word sense disambiguation (wsd) aspect of our experiment. We report evaluation results on 11 benchmark datasets involving wsd and Word Similarity tasks and show that our method for enhancing distributional semantic structures improves embeddings quality on the baselines. Despite the small training data, it achieves state-of-the-art performance on some of the datasets.



中文翻译:

EDS-MEMBED:基于增强的分布语义结构的多义嵌入,通过图形遍历词义

几种语言应用程序通常需要单词语义作为精确含义推断或语义相似性作为其处理管道的核心部分。可以利用多义嵌入(m-se)来满足这一重要要求。力求通过不同的意义来表示每个单词,以解决在不同上下文中使用的单词含义的混淆。以前的作品通常通过在大型语料库上训练模型来完成此任务,并且经常忽略词汇资源提供的语义关系的作用和有用性。但是,即使有大量的训练数据,覆盖所有可能的词义仍然是一个问题。另外,从未学习过相当大比例的上下文语义知识,因为从未探索过大量可能的分布语义结构。在本文中,我们利用图论遍历技术在单词感官上利用WordNet中丰富的语义结构,以提高多义嵌入的质量。该算法从原始文本组成丰富的文本。此外,m-se来自先前的。我们将这些措施调整到实验中的单词歧义消除(wsd)方面。我们报告了涉及wsd和Word相似性任务的11个基准数据集的评估结果,并表明我们用于增强分布式语义结构的方法可提高基线上的嵌入质量。尽管训练数据很少,但它在某些数据集上仍达到了最先进的性能。

更新日期:2021-03-02
down
wechat
bug