当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parallel algorithms for finding connected components using linear algebra
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-05-19 , DOI: 10.1016/j.jpdc.2020.04.009
Yongzhe Zhang , Ariful Azad , Aydın Buluç

Finding connected components is one of the most widely used operations on a graph. Optimal serial algorithms for the problem have been known for half a century, and many competing parallel algorithms have been proposed over the last several decades under various different models of parallel computation. This paper presents a class of parallel connected-component algorithms designed using linear-algebraic primitives. These algorithms are based on a PRAM algorithm by Shiloach and Vishkin and can be designed using standard GraphBLAS operations. We demonstrate two algorithms of this class, one named LACC for Linear Algebraic Connected Components, and the other named FastSV which can be regarded as LACC’s simplification. With the support of the highly-scalable Combinatorial BLAS library, LACC and FastSV outperform the previous state-of-the-art algorithm by a factor of up to 12x for small to medium scale graphs. For large graphs with more than 50B edges, LACC and FastSV scale to 4K nodes (262K cores) of a Cray XC40 supercomputer and outperform previous algorithms by a significant margin. This remarkable performance is accomplished by (1) exploiting sparsity that was not present in the original PRAM algorithm formulation, (2) using high-performance primitives of Combinatorial BLAS, and (3) identifying hot spots and optimizing them away by exploiting algorithmic insights.



中文翻译:

使用线性代数查找连接组件的并行算法

查找连接的组件是图形上使用最广泛的操作之一。针对该问题的最佳串行算法已经知道了半个世纪,并且在过去的几十年中,在各种不同的并行计算模型下,提出了许多竞争性并行算法。本文介绍了使用线性代数基元设计的一类并行连接组件算法。这些算法基于Shiloach和Vishkin的PRAM算法,可以使用标准GraphBLAS操作进行设计。我们演示了此类的两种算法,一种称为线性代数连接组件的LACC,另一种称为FastSV,可以看作是LACC的简化。在高度可扩展的组合BLAS库的支持下,对于中小型图形,LACC和FastSV的性能比以前的最新算法高出12倍。对于边缘超过50B的大型图形,LACC和FastSV可以缩放到Cray XC40超级计算机的4K节点(262K内核),并且要比以前的算法大幅度提高。通过(1)利用原始PRAM算法公式中不存在的稀疏性,(2)使用组合BLAS的高性能原语,以及(3)识别热点并通过利用算法洞察力对其进行优化来实现这一出色的性能。

更新日期:2020-05-19
down
wechat
bug