Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing,Information Systems Frontiers

当前位置： X-MOL 学术 › Inf. Syst. Front. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing
Information Systems Frontiers ( IF 5.9 ) Pub Date : 2020-03-04 , DOI: 10.1007/s10796-020-09998-z
Abdallah Khelil , Amin Mesmoudi , Jorge Galicia , Ladjel Bellatreche , Mohand-Saïd Hacid , Emmanuel Coquery

The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.

中文翻译：

结合图探索和碎片化的可扩展RDF查询处理

资源描述框架（RDF）提供的灵活性已使其成为使用三元组概念用未定义或可变模式表示数据的非常流行的标准。它的成功产生了许多大规模的多学科数据集，从而促进了高效RDF处理系统的开发。当前的方法可以分为两类：第一类，采用在表中存储三元组的关系模型，第二种创建将RDF数据建模为图形的数据结构。第一组策略更容易扩展，因为它们应用了来自关系模型的优化策略，例如索引和分段。但是，这些方法在处理复杂查询时会产生许多开销（例如，在现有应用程序中持久存在的包含过滤器的复合SPARQL图。另一方面，使用更复杂的数据结构的基于图的系统无法有效地管理主内存，并且在资源有限的计算机硬件中无法伸缩。在本文中，我们提出了一种新颖的方法来对RDF数据执行查询（基本图形模式，通配符，聚合和排序）。我们建议将RDF图探索与三元组的物理碎片相结合。在这项工作中，我们描述了基于图的存储和查询评估模型。然后，我们详细介绍了系统的体系结构，并在很大程度上解释了基于Volcano执行模型的策略，该策略用于在查询运行时管理主内存。我们对合成和真实数据集进行了广泛的实验，以评估我们提案的效率。我们将我们的性能与基于关系的（Virtuoso），基于图形的（gStore）和密集索引（RDF-3X）的方法进行了比较。根据我们的评估，我们的系统在高效查询处理和可伸缩性之间提供了最佳折衷方案。

更新日期：2020-04-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>