当前位置: X-MOL 学术Inf. Retrieval J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identifying and exploiting target entity type information for ad hoc entity retrieval
Information Retrieval Journal ( IF 1.7 ) Pub Date : 2018-12-05 , DOI: 10.1007/s10791-018-9346-x
Darío Garigliotti , Faegheh Hasibi , Krisztian Balog

Today, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in two settings: firstly, in an idealized “oracle” setting, assuming that we know the distribution of target types of the relevant entities for a given query; and secondly, in a realistic scenario, where target entity types are identified automatically based on the keyword query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we show that type information can significantly and substantially improve retrieval performance, yielding up to 67% relative improvement in terms of NDCG@10 over a strong text-only baseline in an oracle setting. We further show that using automatic target type detection, we can outperform the text-only baseline by 44% in terms of NDCG@10. This is as good as, and sometimes even better than, what is attainable by using explicit target type information provided by humans. These results indicate that identifying target entity types of queries is challenging even for humans and attests to the effectiveness of our proposed automatic approach.

中文翻译:

识别和利用目标实体类型信息以进行临时实体检索

如今,响应搜索查询而从知识库返回实体的做法已变得很普遍。实体的显着特征之一是对它们进行类型化,即,将其分配给某些层次结构化的类型系统(类型分类法)。本文的主要目的是更好地了解实体类型信息如何在实体检索中得到利用。我们在两个设置中执行此调查:首先,在理想的“ oracle”设置中,假设我们知道给定查询的相关实体的目标类型的分布;其次,在现实情况中,根据关键字查询自动识别目标实体类型。我们对三个主要方面进行了全面分析:(i)类型分类的选择,(ii)层次类型信息的表示形式,以及(iii)检索模型中基于类型和基于术语的相似性的组合。使用基于DBpedia的标准实体搜索测试集合,我们显示类型信息可以显着并显着提高检索性能,在oracle设置中,相对于仅基于文本的强基准,NDCG @ 10方面的相对改进高达67%。我们进一步证明,使用自动目标类型检测,就NDCG @ 10而言,我们的性能比纯文本基线高44%。这与使用人类提供的明确目标类型信息所能达到的效果一样好,有时甚至更好。这些结果表明,即使对于人类而言,识别查询的目标实体类型也具有挑战性,并证明了我们提出的自动方法的有效性。(iii)在检索模型中结合基于类型和基于术语的相似性。使用基于DBpedia的标准实体搜索测试集合,我们显示类型信息可以显着并显着提高检索性能,在oracle设置中,相对于仅基于文本的强基准,NDCG @ 10方面的相对改进高达67%。我们进一步证明,使用自动目标类型检测,就NDCG @ 10而言,我们的性能比纯文本基线高44%。这与使用人类提供的明确目标类型信息所能达到的效果一样好,有时甚至更好。这些结果表明,即使对于人类而言,识别查询的目标实体类型也具有挑战性,并证明了我们提出的自动方法的有效性。(iii)在检索模型中结合基于类型和基于术语的相似性。使用基于DBpedia的标准实体搜索测试集合,我们显示类型信息可以显着并显着提高检索性能,在oracle设置中,相对于仅基于文本的强基准,NDCG @ 10方面的相对改进高达67%。我们进一步证明,使用自动目标类型检测,就NDCG @ 10而言,我们的性能比纯文本基线高44%。这与使用人类提供的明确目标类型信息所能达到的效果一样好,有时甚至更好。这些结果表明,即使对于人类来说,识别查询的目标实体类型也具有挑战性,并证明了我们提出的自动方法的有效性。
更新日期:2018-12-05
down
wechat
bug