Measuring similarity and relatedness using multiple semantic relations in WordNet,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Measuring similarity and relatedness using multiple semantic relations in WordNet
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2019-08-01 , DOI: 10.1007/s10115-019-01387-6
Xinhua Zhu , Xuechen Yang , Yanyi Huang , Qingsong Guo , Bo Zhang

Semantic similarity and relatedness computation has attracted an increasing amount of attention among researchers. The majority of previous studies, including edge-based and information content-based methods, rely on a single semantic relationship in WordNet such as the “is-a” relation. However, a performance ceiling may have been created by semantic unicity and inadequate calculation in solely “is-a” relation-based measurements, i.e., the computed results for some word pairs are too small and significantly deviate from human judgments. For this problem, we propose the following solutions: (1) We introduce the notion of the nearest common descendant to provide a supplement for commonalities between concepts according to genetics theory. (2) We design various targeted methods for different incomplete semantic relations. Therefore, various semantic relations can participate in similarity and relatedness computations in their most appropriate manners. (3) We utilize the cross-use of incomplete semantic relations similar-to and antonymy to solve the challenge of adjective and adverb similarity/relatedness measurements in WordNet. (4) We propose a targeted independent computation and largest contribution aggregation method to break through the performance ceiling of similarity/relatedness measurements based on single “is-a” relations. We conduct evaluations of our proposed model using seven extensively employed datasets. These evaluations indicate that our method significantly improves the performance of the existing methods based on single “is-a” relations. Their best Pearson coefficient with human judgments on both the MC30 and RG65 is increased to 0.9. With the development and enrichment of semantic relations in WordNet, our proposed model can be expected to have a more prominent role.

中文翻译：

在WordNet中使用多个语义关系来衡量相似性和相关性

语义相似度和相关度计算已引起研究人员越来越多的关注。以前的大多数研究，包括基于边缘和基于信息内容的方法，都依赖于WordNet中的单个语义关系，例如“ is-a”关系。但是，性能上限可能是由于语义上的单一性和仅基于“是”关系的度量中的计算不足而造成的，即，某些单词对的计算结果太小且明显偏离了人类的判断力。针对这一问题，我们提出以下解决方案：（1）引入最接近的共同后代的概念，以根据遗传学理论补充概念之间的共同点。（2）针对不同的不完整语义关系设计了各种针对性的方法。因此，各种语义关系可以其最合适的方式参与相似性和相关性计算。（3）利用不完整语义关系的交叉使用类似-以和反义关系解决WordNet中形容词和副词相似性/相关性度量的挑战。（4）我们提出了有针对性的独立计算和最大贡献聚合方法，以突破基于单个“ is-a”关系的相似性/相关性度量的性能上限。我们使用七个广泛使用的数据集对我们提出的模型进行评估。这些评估表明，我们的方法大大改善了基于单个“是”关系的现有方法的性能。通过对MC30和RG65进行人工判断，它们的最佳皮尔逊系数增加到0.9。随着WordNet中语义关系的发展和丰富，可以预期我们提出的模型将发挥更加突出的作用。

更新日期：2019-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>