当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Granite: A distributed engine for scalable path queries over temporal property graphs
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2021-02-14 , DOI: 10.1016/j.jpdc.2021.02.004
Shriram Ramesh , Animesh Baranawal , Yogesh Simmhan

Property graphs are a common form of linked data, with path queries used to traverse and explore them for enterprise transactions and mining. Temporal property graphs are a recent variant where time is a first-class entity to be queried over, and their properties and structure vary over time. These are seen in social, telecom, transit and epidemic networks. However, current graph databases and query engines have limited support for temporal relations among graph entities, no support for time-varying entities and/or do not scale on distributed resources. We address this gap by extending a linear path query model over property graphs to include intuitive temporal predicates and aggregation operators over temporal graphs. We design a distributed execution model for these temporal path queries using the interval-centric computing model, and develop a novel cost model to select an efficient execution plan from several. We perform detailed experiments of our Granite distributed query engine using both static and dynamic temporal property graphs as large as 52M vertices, 218M edges and 325M properties, and a 1600-query workload, derived from the LDBC benchmark. We often offer sub-second query latencies on a commodity cluster, which is 149×1140× faster compared to industry-leading Neo4J shared-memory graph database and the JanusGraph/Spark distributed graph query engine. Granite also completes 100% of the queries for all graphs, compared to only 32–92% workload completion by the baseline systems. Further, our cost model selects a query plan that is within 10% of the optimal execution time in 90% of the cases. Despite the irregular nature of graph processing, we exhibit a weak-scaling efficiency of 60% on 8 nodes and 40% on 16 nodes, for most query workloads.



中文翻译:

G[R一个ñ一世ŤË:分布式引擎,用于通过时间属性图进行可伸缩路径查询

属性图是链接数据的一种常见形式,其中路径查询用于遍历和探索它们以用于企业事务和挖掘。时间属性图是最近的一种变体,其中时间是要查询的一流实体,并且它们的属性和结构随时间而变化。这些在社交,电信,运输和流行网络中都可以看到。但是,当前的图数据库和查询引擎对图实体之间的时间关系的支持有限,不支持时变实体和/或无法在分布式资源上扩展。我们通过在属性图上扩展线性路径查询模型来解决此差距,以在时间图上包括直观的时间谓词聚合运算符。我们设计一个使用以间隔为中心的计算模型为这些时间路径查询提供分布式执行模型,并开发一种新颖的成本模型以从多个模型中选择有效的执行计划。我们进行了详细的实验G[R一个ñ一世ŤË 分布式查询引擎同时使用静态和动态时态图 52中号 顶点 218中号 边缘和 325中号属性和1600查询工作量(源自LDBC基准)。我们经常在商品集群上提供亚秒级的查询等待时间,即149×1140× 与业界领先的Neo4J共享内存图形数据库和JanusGraph / Spark分布式图形查询引擎相比,速度更快。 G[R一个ñ一世ŤË还可以完成所有图表的100%查询,而基线系统仅完成32-92%的工作负载。此外,在90%的情况下,我们的成本模型选择的查询计划在最佳执行时间的10%之内。尽管图形处理具有不规则的性质,但我们表现出的缩放比例较弱60 在8个节点上 40 在16个节点上,适用于大多数查询工作负载。

更新日期:2021-02-15
down
wechat
bug