Uncovering hidden semantics of set information in knowledge bases,Journal of Web Semantics

当前位置： X-MOL 学术 › J. Web Semant. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Uncovering hidden semantics of set information in knowledge bases
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2020-06-15 , DOI: 10.1016/j.websem.2020.100588
Shrestha Ghosh , Simon Razniewski , Gerhard Weikum

Knowledge Bases (KBs) contain a wealth of structured information about entities and predicates. This paper focuses on set-valued predicates, i.e., the relationship between an entity and a set of entities. In KBs, this information is often represented in two formats: (i) via counting predicates such as numberOfChildren and staffSize, that store aggregated integers, and (ii) via enumerating predicates such as parentOf and worksFor, that store individual set memberships. Both formats are typically complementary: unlike enumerating predicates, counting predicates do not give away individuals, but are more likely informative towards the true set size, thus this coexistence could enable interesting applications in question answering and KB curation.

In this paper we aim at uncovering this hidden knowledge. We proceed in two steps. (i) We identify set-valued predicates from a given KB predicates via statistical and embedding-based features. (ii) We link counting predicates and enumerating predicates by a combination of co-occurrence, correlation and textual relatedness metrics. We analyse the prevalence of count information in four prominent knowledge bases, and show that our linking method achieves up to 0.55 F1 score in set predicate identification versus 0.40 F1 score of a random selection, and normalized discounted gains of up to 0.84 at position 1 and 0.75 at position 3 in relevant predicate alignments. Our predicate alignments are showcased in a demonstration system available at https://counqer.mpi-inf.mpg.de/spo.

中文翻译：

在知识库中发现集合信息的隐藏语义

知识库（KB）包含大量有关实体和谓词的结构化信息。本文着重于集值谓词，即一个实体与一组实体之间的关系。以KB为单位，该信息通常以两种格式表示：（i）通过计数谓词（例如numberOfChildren和staffSize）来存储聚合的整数，以及（ii）通过枚举谓词（例如parentOf和worksFor），用于存储个人设置的成员资格。两种格式通常是互补的：与枚举谓词不同，计数谓词不会放弃个人，但更可能提供有关真实集合大小的信息，因此这种共存可以启用有趣的应用程序进行问题解答和KB策划。

在本文中，我们旨在发现这种隐藏的知识。我们分两步进行。（i）我们通过统计和基于嵌入的特征从给定的KB谓词中识别出集值谓词。（ii）我们通过同时出现，相关性和文本相关性度量的组合来链接计数谓词和枚举谓词。我们分析了四个著名知识库中计数信息的普遍性，并表明我们的链接方法在集合谓词识别中最高达到0.55 F1得分，而在随机选择中达到0.40 F1得分，并且在位置1和位置处归一化折现增益最高为0.84。在相关谓词对齐中的位置3处为0.75。我们的谓词对齐方式在https://counqer.mpi-inf.mpg.de/spo上的演示系统中得以展示。

更新日期：2020-06-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11