Fine-Grained Entity Linking,Journal of Web Semantics

当前位置： X-MOL 学术 › J. Web Semant. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fine-Grained Entity Linking
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2020-08-26 , DOI: 10.1016/j.websem.2020.100600
Henry Rosales-Méndez , Aidan Hogan , Barbara Poblete

The Entity Linking (EL) task involves linking mentions of entities in a text with their identifier in a Knowledge Base (KB) such as Wikipedia, BabelNet, DBpedia, Freebase, Wikidata, YAGO, etc. Numerous techniques have been proposed to address this task down through the years. However, not all works adopt the same convention regarding the entities that the EL task should target; for example, while some EL works target common entities like “interview” appearing in the KB, others only target named entities like “Michael Jackson”. The lack of consensus on this issue (and others) complicates research on the EL task; for example, how can the performance of EL systems be evaluated and compared when systems may target different types of entities? In this work, we first design a questionnaire to understand what kinds of mentions and links the EL research community believes should be targeted by the task. Based on these results we propose a fine-grained categorization scheme for EL that distinguishes different types of mentions and links. We propose a vocabulary extension that allows to express such categories in EL benchmark datasets. We then relabel (subsets of) three popular EL datasets according to our novel categorization scheme, where we additionally discuss a tool used to semi-automate the labeling process. We next present the performance results of five EL systems for individual categories. We further extend EL systems with Word Sense Disambiguation and Coreference Resolution components, creating initial versions of what we call Fine-Grained Entity Linking (FEL) systems, measuring the impact on performance per category. Finally, we propose a configurable performance measure based on fuzzy sets that can be adapted for different application scenarios Our results highlight a lack of consensus on the goals of the EL task, show that the evaluated systems do indeed target different entities, and further reveal some open challenges for the (F)EL task regarding more complex forms of reference for entities.

中文翻译：

细粒度实体链接

实体链接（EL）任务涉及将文本中提及的实体与其在知识库（KB）中的标识符的链接，例如Wikipedia，BabelNet，DBpedia，Freebase，Wikidata，YAGO等。已经提出了许多技术来解决此任务这些年来。但是，并非所有作品都针对EL任务应针对的实体采用相同的约定；例如，尽管有些EL作品针对的是KB中出现的“采访”之类的常见实体，而另一些作品仅针对“ Michael Jackson”之类的具名实体。在这个问题上（以及其他问题）缺乏共识，使得关于EL任务的研究变得更加复杂。例如，当系统可能针对不同类型的实体时，如何评估和比较EL系统的性能？在这项工作中我们首先设计一个问卷调查表，以了解EL研究社区认为应针对该任务进行哪些类型的提及和链接。基于这些结果，我们提出了一种针对EL的细粒度分类方案，该方案可以区分不同类型的提及和链接。我们提出了一个词汇扩展，允许在EL基准数据集中表达这些类别。然后，根据我们新颖的分类方案，我们重新标记三个流行的EL数据集（的子集），在此我们另外讨论了用于半自动化标记过程的工具。接下来，我们将介绍五个EL系统针对各个类别的性能结果。我们进一步扩展了带有词义消除歧义和共指解析组件的EL系统，创建了我们所说的初始版本基于这些结果，我们提出了一种针对EL的细粒度分类方案，该方案可以区分不同类型的提及和链接。我们提出了一个词汇扩展，允许在EL基准数据集中表达这些类别。然后，根据我们新颖的分类方案，我们重新标记三个流行的EL数据集（的子集），在此我们另外讨论了用于半自动化标记过程的工具。接下来，我们将介绍五个EL系统针对各个类别的性能结果。我们进一步扩展了带有词义消除歧义和共指解析组件的EL系统，创建了我们所说的初始版本基于这些结果，我们提出了一种针对EL的细粒度分类方案，该方案可以区分不同类型的提及和链接。我们提出了一个词汇扩展，允许在EL基准数据集中表达这些类别。然后，根据我们新颖的分类方案，我们重新标记三个流行的EL数据集（的子集），在此我们另外讨论了用于半自动化标记过程的工具。接下来，我们将介绍五个EL系统针对各个类别的性能结果。我们进一步扩展了带有词义消除歧义和共指解析组件的EL系统，创建了我们所说的初始版本然后，根据我们新颖的分类方案，我们重新标记三个流行的EL数据集（的子集），在此我们另外讨论了用于半自动化标记过程的工具。接下来，我们将介绍五个EL系统针对各个类别的性能结果。我们进一步扩展了带有词义消除歧义和共指解析组件的EL系统，创建了我们所说的初始版本然后，根据我们新颖的分类方案，我们重新标记三个流行的EL数据集（的子集），在此我们另外讨论了用于半自动化标记过程的工具。接下来，我们将介绍五个EL系统针对各个类别的性能结果。我们进一步扩展了带有词义消除歧义和共指解析组件的EL系统，创建了我们所说的初始版本细粒度实体链接（FEL）系统，用于衡量对每个类别的性能的影响。最后，我们提出了一种基于模糊集的可配置性能度量，该度量可适用于不同的应用场景。我们的结果表明，在EL任务的目标上缺乏共识，表明被评估的系统确实针对不同的实体，并进一步揭示了一些（F）EL任务面临有关实体参考形式更复杂的挑战。

更新日期：2020-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11