Semantic relatedness algorithm for keyword sets of geographic metadata,Cartography and Geographic Information Science

当前位置： X-MOL 学术 › Cartography and Geographic Information Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semantic relatedness algorithm for keyword sets of geographic metadata
Cartography and Geographic Information Science ( IF 2.6 ) Pub Date : 2019-09-20 , DOI: 10.1080/15230406.2019.1647797
Zugang Chen ₁ , Yaping Yang ₂

Affiliation

ABSTRACT

Advances in linked geospatial data, recommender systems, and geographic information retrieval have led to urgent necessity to assess the overall semantic relatedness between keyword sets of geographic metadata. In this study, a new model is proposed for computing the semantic relatedness between arbitrary two keyword sets of geographic metadata stored in current global spatial data infrastructures. In this model, the overall semantic relatedness is derived by pairing these keywords that are found to be most relevant to each other and averaging their relatedness. To find the most relevant keywords across two keyword sets precisely, the keywords in the keyword set of geographic metadata are divided into three kinds: the thesaurus elements, the WordNet elements, and the statistical elements. The thesaurus-lexical relatedness measure (TLRM), the extended thesaurus-lexical relatedness measure (ETLRM), and the Longest Common Substring method are proposed to compute the semantic relatedness between two thesaurus elements, two WordNet elements, a thesaurus element, and a WordNet element and two statistical elements, respectively. A human data set – the geographic-metadata’s keyword set relatedness dataset, which was used to evaluate the precision of the semantic relatedness measures of keyword sets of geographic metadata, was created. The proposed method was evaluated against the human-generated relatedness judgments and was compared with the Jaccard method and Vector Space Model. The results demonstrated that the proposed method achieved a high correlation with human judgments and outperformed the existing methods. Finally, the proposed method was applied to quantitatively linked geospatial data.

中文翻译：

地理元数据关键词集的语义相关算法

摘要

链接的地理空间数据，推荐系统和地理信息检索的进步已导致迫切需要评估地理元数据的关键字集之间的总体语义相关性。在这项研究中，提出了一种新模型，用于计算存储在当前全球空间数据基础结构中的地理元数据的任意两个关键词集之间的语义相关性。在此模型中，通过将发现最相关的这些关键字进行配对并求平均它们的相关性，可以得出总体语义相关性。为了精确地在两个关键字集中找到最相关的关键字，将地理元数据关键字集中的关键字分为三种：同义词库元素，WordNet元素和统计元素。词库-词汇相关度（TLRM），提出了扩展词库-词法相关性度量（ETLRM）和最长公共子串方法，以分别计算两个词库元素，两个WordNet元素，一个词库元素，一个WordNet元素和两个统计元素之间的语义相关性。创建了一个人类数据集-地理元数据的关键字集相关性数据集，该数据集用于评估地理元数据关键字集的语义相关性度量的精度。该方法针对人为产生的相关性判断进行了评估，并与Jaccard方法和向量空间模型进行了比较。结果表明，该方法与人的判断具有较高的相关性，并且优于现有方法。最后，

更新日期：2019-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文