当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RisGraph: A Real-Time Streaming System for Evolving Graphs
arXiv - CS - Databases Pub Date : 2020-04-02 , DOI: arxiv-2004.00803
Guanyu Feng, Zixuan Ma, Daixuan Li, Xiaowei Zhu, Yanzheng Cai, Wentao Han, Wenguang Chen

Graphs in the real world are constantly changing and of large scale. In processing these evolving graphs, the combination of update workloads (updating vertices and edges in a streaming manner) and analytical (performing graph algorithms incrementally) workloads is ubiquitous. Throughput, latency, and granularity are three key requirements in processing evolving graphs with such combined workloads. Although there are several streaming systems proposed for evolving graphs to improve latency. They usually use batch-update model to improve throughput but hurt granularity. It is still challenging to fulfill all the requirements simultaneously, especially for power-law graphs because they are difficult to be partitioned. We analyze the computational cost on synthesized power-law graphs and realistic evolving graphs from public datasets. We find that the affected areas are usually small for each update, and there are scheduling opportunities for combined workloads. Based on these observations, we design a real-time streaming system for incremental graph computing called RisGraph. Our novel design on scheduling, trade-offs on data structures and the computing engine make RisGraph satisfy the three requirements at the same time. The evaluation shows RisGraph can ingest millions of updates per second and its 99.9%ile latency is within 20 milliseconds for graphs with hundreds of millions of vertices and billions of edges on a single commodity machine.

中文翻译:

RisGraph:用于进化图的实时流系统

现实世界中的图是不断变化的,而且是大规模的。在处理这些不断发展的图时,更新工作负载(以流方式更新顶点和边)和分析(增量执行图算法)工作负载的组合无处不在。吞吐量、延迟和粒度是处理具有此类组合工作负载的演化图的三个关键要求。尽管提出了几种流系统来改进图形以改善延迟。他们通常使用批量更新模型来提高吞吐量,但会损害粒度。同时满足所有要求仍然具有挑战性,特别是对于幂律图,因为它们难以分割。我们分析了来自公共数据集的合成幂律图和现实演化图的计算成本。我们发现每次更新受影响的区域通常很小,并且有组合工作负载的调度机会。基于这些观察,我们设计了一个用于增量图计算的实时流系统,称为 RisGraph。我们在调度、数据结构和计算引擎上的权衡的新颖设计使 RisGraph 同时满足这三个要求。评估表明,RisGraph 每秒可以摄取数百万次更新,对于单台商用机器上具有数亿个顶点和数十亿条边的图,其 99.9% 的延迟在 20 毫秒以内。我们在调度、数据结构和计算引擎上的权衡的新颖设计使 RisGraph 同时满足这三个要求。评估表明,RisGraph 每秒可以摄取数百万次更新,对于单台商用机器上具有数亿个顶点和数十亿条边的图,其 99.9% 的延迟在 20 毫秒以内。我们在调度、数据结构和计算引擎上的权衡的新颖设计使 RisGraph 同时满足这三个要求。评估表明,RisGraph 每秒可以摄取数百万次更新,对于单台商用机器上具有数亿个顶点和数十亿条边的图,其 99.9% 的延迟在 20 毫秒以内。
更新日期:2020-04-03
down
wechat
bug