当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GoFast: Graph-based optimization for efficient and scalable query evaluation
Information Systems ( IF 3.0 ) Pub Date : 2021-02-17 , DOI: 10.1016/j.is.2021.101738
Ishaq Zouaghi , Amin Mesmoudi , Jorge Galicia , Ladjel Bellatreche , Taoufik Aguili

The popularity of the Resource Description Framework (RDF) and SPARQL has thrust the development of high-performance systems to manage data represented with this model. Former approaches adapted the well-established relational model applying its storage, query processing, and optimization strategies. However, the borrowed techniques from the relational model are not universally applicable in the RDF context. First, the schema-free nature of RDF induces intensive joins overheads. Also, optimization strategies trying to find the optimal join order rely on error-prone statistics unable to capture all the correlations among triples. Graph-based approaches keep the graph structure of RDF representing the data directly as a graph. Their execution model leans on graph exploration operators to find subgraph matches to a query. Even if they have shown to outperform relational-based systems in complex queries, they are barely scalable and optimization techniques are completely system dependent. Recently, some systems such as RDF_QDAG have shown that by combining graph exploration and triples clustering one can achieve a good compromise between performance and scalability. In this paper, we propose optimization strategies for this kind of RDF management systems. First, we define novel statistics collected for clusters of triples to better capture the dependencies found in the original graph. Second, we redefine an execution plan based on these logical structures which allows to represent the RDF graph exploration process. Third, we introduce an algorithm for selecting the optimal execution plan based on a customized cost model. Finally, we propose a new approach to refine the chosen plan by pruning invalid clusters that do not participate in the construction of the final query results. All our proposals are validated experimentally using well-known RDF benchmarks.



中文翻译:

GoFast:基于图的优化,可进行高效且可扩展的查询评估

资源描述框架(RDF)和SPARQL的普及推动了高性能系统的开发,以管理用此模型表示的数据。以前的方法使用其存储,查询处理和优化策略来改编已建立的关系模型。但是,从关系模型中借用的技术在RDF上下文中并不普遍适用。首先,RDF的无模式性质导致密集的连接开销。同样,试图找到最佳连接顺序的优化策略依赖于容易出错的统计信息,而这些统计信息无法捕获三元组之间的所有相关性。基于图的方法使RDF的图结构直接将数据表示为图。他们的执行模型依靠图探索运算符来查找查询的子图匹配项。即使它们在复杂查询中表现出优于基于关系的系统,它们也几乎不可扩展,并且优化技术完全取决于系统。最近,诸如RDF_QDAG之类的一些系统已经表明,通过将图探索和三元组聚类相结合,可以在性能和可伸缩性之间取得良好的折衷。在本文中,我们提出了这种RDF管理系统的优化策略。首先,我们定义为三元组群集收集的新颖统计数据,以更好地捕获原始图中发现的依赖关系。其次,我们基于这些逻辑结构重新定义一个执行计划,该执行计划可以表示RDF图探索过程。第三,我们介绍了一种基于定制成本模型选择最佳执行计划的算法。最后,我们提出了一种新方法,通过修剪不参与最终查询结果构建的无效聚类来优化所选计划。我们所有的建议均使用著名的RDF基准进行实验验证。

更新日期:2021-02-26
down
wechat
bug