当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-07-01 , DOI: 10.1109/tpds.2019.2953852
Mingzhen Li , Yi Liu , Hailong Yang , Zhongzhi Luan , Lin Gan , Guangwen Yang , Depei Qian

To improve the performance of sparse Cholesky factorization, existing research divides the adjacent columns of the sparse matrix with the same nonzero patterns into supernodes for parallelization. However, due to the various structures of sparse matrices, the computation of the generated supernodes varies significantly, and thus hard to optimize when computed by dense matrix kernels. Therefore, how to efficiently map sparse Choleksy factorization to the emerging architectures, such as Sunway many-core processor, remains an active research direction. In this article, we propose swCholesky, which is a highly optimized implementation of sparse Cholesky factorization on Sunway processor. Specifically, we design three kernel task queues and a dense matrix library to dynamically adapt to the kernel characteristics and architecture features. In addition, we propose an auto-tuning mechanism to search for the optimal settings of the important parameters in swCholesky. Our experiments show that swCholesky achieves better performance than state-of-the-art implementations.

中文翻译:

在 Sunway Manycore 架构上加速稀疏 Cholesky 分解

为了提高稀疏 Cholesky 分解的性能,现有研究将具有相同非零模式的稀疏矩阵的相邻列划分为超级节点进行并行化。然而,由于稀疏矩阵的结构多种多样,生成的超节点的计算量差异很大,因此在密集矩阵核计算时难以优化。因此,如何有效地将稀疏 Choleksy 分解映射到新兴的架构,如双威众核处理器,仍然是一个活跃的研究方向。在本文中,我们提出了 swCholesky,它是 Sunway 处理器上稀疏 Cholesky 分解的高度优化实现。具体来说,我们设计了三个内核任务队列和一个密集矩阵库来动态适应内核特性和架构特性。此外,我们提出了一种自动调整机制来搜索 swCholesky 中重要参数的最佳设置。我们的实验表明,swCholesky 实现了比最先进的实现更好的性能。
更新日期:2020-07-01
down
wechat
bug