当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PaScaL_TDMA: A library of parallel and scalable solvers for massive tridiagonal systems
Computer Physics Communications ( IF 6.3 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.cpc.2020.107722
Ki-Ha Kim , Ji-Hoon Kang , Xiaomin Pan , Jung-Il Choi

Abstract The aim of this study is to devise an efficient and scalable computational procedure to solve the many tridiagonal systems in multi-dimensional partial differential equations. The modified Thomas algorithm and a newly designed communication scheme were used to reduce the communication overhead encountered while solving the many tridiagonal systems. Benchmark test results reveal an advantage of the proposed procedures compared to global all-to-all communication methods — a significantly reduced communication time that becomes more prominent for larger problem sizes and greater number of cores. The proposed computational procedures are fully implemented in an open-source library called Parallel and Scalable Library for TDMA (PaScaL_TDMA). Considering a three-dimensional heat conduction problem as a practical example, we obtain good strong and weak scalability results up to 262,144 computing cores on the KISTI Nurion cluster system, which, to the best of our knowledge, is the largest parallel simulation for solving tridiagonal systems. The potential of this library for large-scale substantive problems in physics is also demonstrated through direct numerical simulations of the Rayleigh–Benard convection problem, which yielded excellent scalability and accurate results. Program summary Program Title: PaScaL_TDMA CPC Library link to program files: http://dx.doi.org/10.17632/49z6fh94z3.1 Developer’s repository link: https://github.com/MPMC-Lab Licensing provisions: MIT Programming language: Fortran90 Nature of problem: Tridiagonal systems for solving multi-dimensional partial differential equations. Solution method: The divide-and-conquer method is used to solve partitioned tridiagonal systems of equations in the distributed memory system. The partitioned tridiagonal systems of equations are transformed into modified sub-matrices using the modified Thomas algorithm [1]. Reduced tridiagonal systems are constructed by collecting the first and last rows of the modified submatrices from each computing core. The newly designed communication scheme based on MPI_Alltoallw accelerates to collect rows and construct the reduced tridiagonal systems. The solutions of the reduced tridiagonal systems are obtained via the sequential Thomas algorithm. Thereafter, the remaining unknowns in the modified sub-matrices are solved using solutions of the reduced tridiagonal systems.

中文翻译:

PaScaL_TDMA:用于大规模三对角系统的并行和可扩展求解器库

摘要 本研究的目的是设计一种高效且可扩展的计算程序来求解多维偏微分方程中的许多三对角系统。改进的 Thomas 算法和新设计的通信方案用于减少在解决许多三对角系统时遇到的通信开销。基准测试结果表明,与全局全对全通信方法相比,所提出的程序具有优势——通信时间显着减少,对于更大的问题规模和更多的内核,这一点变得更加突出。所提出的计算过程完全在称为 TDMA 并行和可扩展库 (PaScaL_TDMA) 的开源库中实现。以三维热传导问题为例,我们在 KISTI Nurion 集群系统上获得了高达 262,144 个计算核心的良好的强弱可扩展性结果,据我们所知,这是求解三对角系统的最大并行模拟。通过瑞利-贝纳德对流问题的直接数值模拟,也证明了该库在物理学中大规模实质性问题的潜力,这产生了出色的可扩展性和准确的结果。程序摘要 程序名称:PaScaL_TDMA CPC 库程序文件链接:http://dx.doi.org/10.17632/49z6fh94z3.1 开发者存储库链接:https://github.com/MPMC-Lab 许可条款:MIT 编程语言: Fortran90 问题性质:求解多维偏微分方程的三对角系统。解决方法:分治法用于求解分布式存储系统中的分区三对角方程组。使用修改的 Thomas 算法 [1] 将分割的三对角方程组转换为修改后的子矩阵。通过从每个计算核心收集修改后的子矩阵的第一行和最后一行来构建简化的三对角系统。新设计的基于 MPI_Alltoallw 的通信方案加速收集行和构建简化的三对角系统。简化的三对角系统的解是通过顺序 Thomas 算法获得的。此后,使用简化的三对角系统的解来求解修改后的子矩阵中的剩余未知数。使用修改的 Thomas 算法 [1] 将分割的三对角方程组转换为修改后的子矩阵。通过从每个计算核心收集修改后的子矩阵的第一行和最后一行来构建简化的三对角系统。新设计的基于 MPI_Alltoallw 的通信方案加速收集行和构建简化的三对角系统。简化的三对角系统的解是通过顺序 Thomas 算法获得的。此后,使用简化的三对角系统的解来求解修改后的子矩阵中的剩余未知数。使用修改的 Thomas 算法 [1] 将分割的三对角方程组转换为修改后的子矩阵。通过从每个计算核心收集修改后的子矩阵的第一行和最后一行来构建简化的三对角系统。新设计的基于 MPI_Alltoallw 的通信方案加速收集行并构建简化的三对角系统。简化的三对角系统的解是通过顺序 Thomas 算法获得的。此后,使用简化的三对角系统的解来求解修改后的子矩阵中的剩余未知数。通过从每个计算核心收集修改后的子矩阵的第一行和最后一行来构建简化的三对角系统。新设计的基于 MPI_Alltoallw 的通信方案加速收集行和构建简化的三对角系统。简化的三对角系统的解是通过顺序 Thomas 算法获得的。此后,使用简化的三对角系统的解来求解修改后的子矩阵中的剩余未知数。通过从每个计算核心收集修改后的子矩阵的第一行和最后一行来构建简化的三对角系统。新设计的基于 MPI_Alltoallw 的通信方案加速收集行和构建简化的三对角系统。简化的三对角系统的解是通过顺序 Thomas 算法获得的。此后,使用简化的三对角系统的解来求解修改后的子矩阵中的剩余未知数。
更新日期:2021-03-01
down
wechat
bug