GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination
arXiv - CS - Databases Pub Date : 2020-09-23 , DOI: arxiv-2009.10955
Tianhui Shi, Mingshu Zhai, Yi Xu, Jidong Zhai

Graph pattern matching, which aims to discover structural patterns in graphs, is considered one of the most fundamental graph mining problems in many real applications. Despite previous efforts, existing systems face two main challenges. First, inherent symmetry existing in patterns can introduce a large amount of redundant computation. Second, different matching orders for a pattern have significant performance differences and are quite hard to predict. When these factors are mixed, this problem becomes extremely complicated. High efficient pattern matching remains an open problem currently. To address these challenges, we propose GraphPi, a high performance distributed pattern matching system. GraphPi utilizes a new algorithm based on 2-cycles in group theory to generate multiple sets of asymmetric restrictions, where each set can eliminate redundant computation completely. We further design an accurate performance model to determine the optimal matching order and asymmetric restriction set for efficient pattern matching. We evaluate GraphPi on Tianhe-2A supercomputer. Results show that GraphPi outperforms the state-ofthe-art system, by up to 105X for 6 real-world graph datasets on a single node. We also scale GraphPi to 1,024 computing nodes (24,576 cores).

中文翻译：

GraphPi：通过有效的冗余消除实现高性能图形模式匹配

图模式匹配旨在发现图中的结构模式，被认为是许多实际应用中最基本的图挖掘问题之一。尽管之前做出了努力，但现有系统面临两个主要挑战。首先，模式中存在的固有对称性会引入大量冗余计算。其次，模式的不同匹配顺序具有显着的性能差异并且很难预测。当这些因素混合在一起时，这个问题就变得极其复杂了。高效模式匹配目前仍然是一个悬而未决的问题。为了应对这些挑战，我们提出了 GraphPi，一种高性能的分布式模式匹配系统。GraphPi 利用群论中基于 2-cycles 的新算法生成多组非对称约束，其中每个集合都可以完全消除冗余计算。我们进一步设计了一个准确的性能模型来确定最佳匹配顺序和非对称限制集，以实现高效的模式匹配。我们在天河 2A 超级计算机上评估 GraphPi。结果表明，对于单个节点上的 6 个真实世界图形数据集，GraphPi 的性能优于最先进的系统，最高可达 105 倍。我们还将 GraphPi 扩展到 1,024 个计算节点（24,576 个内核）。

更新日期：2020-09-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>