当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A parallel structured divide-and-conquer algorithm for symmetric tridiagonal eigenvalue problems
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-02-01 , DOI: 10.1109/tpds.2020.3019471
Xia Liao , Shengguo Li , Yutong Lu , Jose E. Roman

In this article, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchy-like matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Experimental results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [16] and only scaled to 300 processes for the same matrices. Comparing with PDSTEDC in ScaLAPACK, PSDC is always faster and achieves 1.4x–1.6x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes.

中文翻译:

对称三对角特征值问题的并行结构分治算法

在本文中,基于 ScaLAPACK 和称为 PSMMA 的并行结构化矩阵乘法算法,针对对称三对角矩阵提出了一种并行结构化分治 (PSDC) 特征求解器。通过矩阵-矩阵乘法计算特征向量是分治算法中计算量最大的部分,这种乘法涉及的矩阵之一是秩结构的类柯西矩阵。通过利用这一特殊性质,PSMMA 在没有任何通信的情况下使用类柯西矩阵的生成器构造局部矩阵,并通过使用结构化的低秩近似算法进一步降低计算成本。因此,减少了通信和计算成本。实验结果表明,PSMMA 和 PSDC 都具有高度的可扩展性,并且至少可以扩展到 4096 个进程。PSDC 比 [16] 中提出的 PHDC 具有更好的可扩展性,并且对于相同的矩阵仅扩展到 300 个进程。与 ScaLAPACK 中的 PDSTEDC 相比,PSDC 总是更快,并且对于一些几乎没有紧缩的矩阵实现了 1.4 到 1.6 倍的加速。PSDC 也可与 ELPA 相媲美,PSDC 在使用少数进程时比 ELPA 更快,而在使用许多进程时则慢一点。
更新日期:2021-02-01
down
wechat
bug