A data distribution model for RDF,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A data distribution model for RDF
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2020-05-16 , DOI: 10.1007/s10619-020-07296-w
Rebeca Schroeder , Raqueline R. M. Penteado , Carmem S. Hara

The ever-increasing amount of RDF data made available requires data to be partitioned across multiple servers. We have witnessed some research progress made towards scaling RDF query processing based on suitable data distribution methods. In general, they work well for queries matching simple triple patterns, but they are not efficient for queries involving more complex patterns. In this paper, we present an RDF data distribution method which overcomes the shortcomings of the current approaches in order to scale RDF storage both on the volume of data and query processing. We apply a method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. We deploy our reasoning on a summarized view of data in order to avoid exhaustive analysis on large datasets. As result, partitioning templates are obtained from data items in an RDF structure. In addition, we provide an approach for dynamic data insertions even if new data do not conform to the original RDF structure. Apart from the repartitioning approaches, we use an overflow repository to store data which may not follow the original schema. Our study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.

中文翻译：

RDF 的数据分布模型

不断增加的 RDF 数据量需要跨多个服务器对数据进行分区。我们已经目睹了基于合适的数据分布方法扩展 RDF 查询处理的一些研究进展。一般来说，它们适用于匹配简单三元组模式的查询，但对于涉及更复杂模式的查询则效率不高。在本文中，我们提出了一种 RDF 数据分布方法，该方法克服了当前方法的缺点，以便在数据量和查询处理方面扩展 RDF 存储。我们应用一种方法来识别查询访问的频繁模式，以便将相关数据保存在同一分区中。我们将推理部署在数据的汇总视图上，以避免对大型数据集进行详尽的分析。结果，分区模板是从 RDF 结构中的数据项中获得的。此外，我们提供了一种即使新数据不符合原始 RDF 结构的动态数据插入方法。除了重新分区方法，我们使用溢出存储库来存储可能不遵循原始模式的数据。我们的研究表明，与 RDF 的替代数据分发方法相比，我们的方法具有良好的扩展性，并且通过减少服务器之间的消息传递量可以有效地提高整体性能。

更新日期：2020-05-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11