Random walk-based entity representation learning and re-ranking for entity search,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Random walk-based entity representation learning and re-ranking for entity search
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2020-02-18 , DOI: 10.1007/s10115-020-01445-4
Takahiro Komamizu

Linked Data (LD) has become a valuable source of factual records, and entity search is a fundamental task in LD. The task is, given a query consisting of a set of keywords, to retrieve a set of relevant entities in LD. The state-of-the-art approaches for entity search are based on information retrieval techniques. This paper first examines these approaches with a traditional evaluation metric, recall@k, to reveal their potential for improvement. To obtain evidence for the potentials, an investigation is carried out on the relationship between queries and answer entities in terms of path lengths on a graph of LD. On the basis of the investigation, learning representations of entities are dealt with. The existing methods of entity search are based on heuristics that determine relevant fields (i.e., predicates and related entities) to constitute entity representations. Since the heuristics require burdensome human decisions, this paper is aimed at removing the burden with a graph proximity measurement. To this end, in this paper, RWRDoc is proposed. It is an RWR (random walk with restart)-based representation learning method that learns representations of entities by using weighted combinations of representations of reachable entities w.r.t. RWR. RWRDoc is mainly designed to improve recall scores; therefore, as shown in experiments, it lacks capability in ranking. In order to improve the ranking qualities, this paper proposes a personalized PageRank-based re-ranking method, PPRSD (Personalized PageRank-based Score Distribution), for the retrieved results. PPRSD distributes relevance scores calculated by text-based entity search methods in a personalized PageRank manner. Experimental evaluations showcase that RWRDoc can improve search qualities in terms of recall@1000 and PPRSD can compensate for RWRDoc’s insufficient ranking capability, and the evaluations confirmed this compensation.

中文翻译：

基于随机游动的实体表示学习和实体搜索重新排序

链接数据（LD）已成为事实记录的重要来源，而实体搜索是LD中的一项基本任务。给定包含一组关键字的查询，该任务将检索LD中的一组相关实体。实体搜索的最新方法基于信息检索技术。本文首先使用传统的评估指标callback @ k研究了这些方法，以揭示它们的改进潜力。为了获得潜在的证据，根据LD图上的路径长度对查询和答案实体之间的关系进行了调查。在调查的基础上，处理实体的学习表示。实体搜索的现有方法基于确定相关字段的启发式方法（即谓词和相关实体）构成实体表示。由于启发式方法需要繁重的人工决策，因此本文旨在通过图形接近度测量来消除负担。为此，本文提出了RWRDoc。这是一种基于RWR（带重启的随机游走）的表示学习方法，它通过使用RWR可达实体的表示的加权组合来学习实体的表示。RWRDoc主要用于提高召回率；因此，如实验所示，它缺乏排名能力。为了提高排名质量，本文针对检索的结果提出了一种基于PageRank的个性化重新排名方法PPRSD（基于PageRank的个性化得分分配）。PPRSD以个性化的PageRank方式分发通过基于文本的实体搜索方法计算出的相关性分数。实验评估表明，RWRDoc可以在callback @ 1000方面提高搜索质量，而PPRSD可以弥补RWRDoc的排名能力不足，评估证实了这一补偿。

更新日期：2020-02-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>