当前位置: X-MOL 学术SIAM J. Sci. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Distributed-Memory Algorithm for Computing a Heavy-Weight Perfect Matching on Bipartite Graphs
SIAM Journal on Scientific Computing ( IF 3.0 ) Pub Date : 2020-08-05 , DOI: 10.1137/18m1189348
Ariful Azad , Aydin Buluç , Xiaoye S. Li , Xinliang Wang , Johannes Langguth

SIAM Journal on Scientific Computing, Volume 42, Issue 4, Page C143-C168, January 2020.
We design and implement an efficient parallel algorithm for finding a perfect matching in a weighted bipartite graph such that weights on the edges of the matching are large. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before factorization. Due to the lack of scalable alternatives, distributed solvers use sequential implementations of maximum weight perfect matching algorithms, such as those available in MC64. To overcome this limitation, we propose a fully parallel distributed memory algorithm that first generates a perfect matching and then iteratively improves the weight of the perfect matching by searching for weight-increasing cycles of length 4 in parallel. For most practical problems the weights of the perfect matchings generated by our algorithm are very close to the optimum. An efficient implementation of the algorithm scales up to 256 nodes (17,408 cores) on a Cray XC40 supercomputer and can solve instances that are too large to be handled by a single node using the sequential algorithm.


中文翻译:

一种用于计算二部图上的权重完美匹配的分布式内存算法

SIAM科学计算杂志,第42卷,第4期,第C143-C168页,2020年1月。
我们设计并实现了一种有效的并行算法,用于在加权二分图中找到完美匹配,以使匹配边缘的权重较大。该问题不同于最大权重匹配问题,对于该问题,可伸缩近似算法是已知的。它的主要动机是在分解之前在可伸缩的稀疏直接求解器中找到良好的支点。由于缺乏可扩展的替代方案,分布式求解器使用最大权重完美匹配算法的顺序实现,例如MC64中可用的算法。为了克服此限制,我们提出了一种完全并行的分布式存储算法,该算法首先生成一个完美匹配,然后通过并行搜索长度为4的权重增加循环来迭代地提高完美匹配的权重。对于大多数实际问题,由我们的算法生成的完美匹配的权重非常接近最优值。该算法的有效实现可在Cray XC40超级计算机上扩展多达256个节点(17,408个内核),并可以使用顺序算法解决太大而无法由单个节点处理的实例。
更新日期:2020-08-05
down
wechat
bug