Using Embedding-Based Similarities to Improve Lexical Resources,Lobachevskii Journal of Mathematics

当前位置： X-MOL 学术 › Lobachevskii J. Math. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using Embedding-Based Similarities to Improve Lexical Resources
Lobachevskii Journal of Mathematics ( IF 0.8 ) Pub Date : 2021-08-09 , DOI: 10.1134/s1995080221070167
N. V. Loukachevitch ₁ , M. M. Tikhomirov ₁ , E. A. Parkhomenko ₁

Affiliation

Abstract

In this paper we discuss the usefulness of applying semi-automatic checking procedures to existing thesauri for natural language processing—large manually-created lexical-semantic resources. The procedure is based on computation of word vector representations and word semantic similarities on large text collections. The first procedure analyses discrepancies between corpus-based and thesaurus-based word similarities. The second procedure compares the hypernyms (more general words) described in a resource and predicted ones from the relevant collection. We applied the procedures to verification of Russian wordnet RuWordNet. Both procedures helped to find some significant mistakes or inconsistencies in word sense description in RuWordNet, which were difficult to reveal in the resource due to its large volume. The proposed procedures also demonstrate the possibility of fast adaptation of an existing semantic resource to a new domain.

中文翻译：

使用基于嵌入的相似性来改进词法资源

摘要

在本文中，我们讨论了将半自动检查程序应用于自然语言处理的现有叙词表的有用性 - 大型手动创建的词汇语义资源。该过程基于对大型文本集合的词向量表示和词语义相似性的计算。第一个过程分析基于语料库和基于词库的词相似度之间的差异。第二个过程将资源中描述的上位词（更一般的词）与相关集合中的预测词进行比较。我们将程序应用于俄语 wordnet RuWordNet 的验证。这两个程序都有助于发现 RuWordNet 中词义描述中的一些重大错误或不一致，这些错误或不一致在资源中由于其量大而难以揭示。

更新日期：2021-08-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文