当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reasoning and querying web-scale open data based on DL-LiteA in a divide-and-conquer way
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2019-02-06 , DOI: 10.1016/j.websem.2019.01.003
Zhenzhen Gu , Songmao Zhang , Cungen Cao

We propose to use DL-LiteA techniques to reason and query the Web-scale Open Data (knowledge bases) described by Semantic Web standards like RDF and OWL due to the low reasoning complexity and suitable expressivity of the language. When facing the real-life scalability challenge, the actual reasoning and query answering may become infeasible by the following two factors. Firstly, for both satisfiability checking and conjunctive query answering, a polynomial size of queries may need to be answered over the data layers of the corresponding knowledge bases (KBs) w.r.t. the size of the schema knowledge of these KBs. Secondly, for KBs with massive individual assertions, evaluating a single query over the data layers may be highly time-consuming. This impels us to seek for a divide-and-conquer reasoning and query answering approach for DL-LiteA, with the basic idea of partitioning both KBs and queries into smaller chunks and decomposing the original reasoning and query answering tasks into a group of independent sub-tasks such that the overall performance can be improved by taking advantage of parallelization and distribution techniques. The challenge for designing such an approach lies in how to carry out partitioning and reasoning reduction in a sound and complete way. Motivated by hash partitioning of RDF graphs, we expect the smaller KB chunks to have the local feature for both satisfiability checking and simple-query answering. Here simple-queries are the conjunctive queries whose query atoms share a common variable or individual. For query answering, we expect to partition a query into smaller simple-queries and evaluate them over smaller KB chunks. Under these expectations, our divide-and-conquer approach is constructed from both theoretical and practical perspectives. Theoretically, definitions of KB partitions and query partitions are presented, and the sufficient and necessary conditions are identified to determine whether a KB partition holds the desired features. Practically, based on the theoretical results, the concrete ways of partitioning KBs and queries as well as evaluating query partitions over KB partitions are described. Moreover, a strategy of optimizing the procedure of evaluating query partitions over KB partitions is provided to improve the overall query answering performance. To verify our approach, two Web-scale open datasets, DBpedia and BTC 2012 dataset, have been chosen. The empirical results indicate that the provided approach opens new possibilities for realizing performance-critical applications on the Web with both high expressivity and scalability.



中文翻译:

基于DL-Lite的Web规模开放数据的推理和查询一种 以分而治之的方式

我们建议使用DL-Lite一种推理和查询语言的复杂性低,并且具有适当的表达能力,因此可以通过语义Web标准(如RDF和OWL)来推理和查询Web规模的开放数据(知识库)。当面对现实的可伸缩性挑战时,由于以下两个因素,实际的推理和查询回答可能变得不可行。首先,对于可满足性检查和联合查询回答而言,可能需要在相应知识库(KB)的数据层上通过这些知识库的模式知识的大小来回答查询的多项式大小。其次,对于具有大量独立断言的知识库,在数据层上评估单个查询可能会非常耗时。这促使我们寻求一种针对DL-Lite的分而治之的推理和查询回答方法一种,其基本思想是将知识库和查询都划分为较小的块,并将原始推理和查询回答任务分解为一组独立的子任务,从而可以利用并行化和分发技术来提高整体性能。设计这种方法的挑战在于如何以合理完整的方式进行分区和推理减少。出于对RDF图的哈希分区的考虑,我们希望较小的KB块具有可满足性检查和简单查询回答的本地功能。这里简单查询是联合查询,它们的查询原子共享一个公共变量或单个变量。对于查询回答,我们希望将查询划分为较小的简单查询,并在较小的KB块中对它们进行评估。在这些期望下,我们的分而治之方法是从理论和实践的角度构建的。从理论上讲,给出了KB分区和查询分区的定义,并确定了充分和必要的条件来确定KB分区是否具有所需的功能。实际上,基于理论结果,描述了对KB和查询进行分区以及评估KB分区上的查询分区的具体方法。此外,提供了一种优化对KB分区上的查询分区进行评估的过程的策略,以提高总体查询回答性能。为了验证我们的方法,选择了两个Web规模的开放数据集DBpedia和BTC 2012数据集。

更新日期:2019-02-06
down
wechat
bug