当前位置: X-MOL 学术J. Comput. Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable Parallel Linear Solver for Compact Banded Systems on Heterogeneous Architectures
Journal of Computational Physics ( IF 4.1 ) Pub Date : 2022-07-19 , DOI: 10.1016/j.jcp.2022.111443
Hang Song , Kristen V. Matsuno , Jacob R. West , Akshay Subramaniam , Aditya S. Ghate , Sanjiva K. Lele

A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transposes or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance optimization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to a wide range of partial differential equation problems, such as but not limited to the numerical simulations of compressible turbulent flows, aeroacoustics, elastic-plastic wave propagation, and electromagnetics. It alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units.



中文翻译:

异构架构上紧凑带状系统的可扩展并行线性求解器

提出了一种可扩展的算法,用于解决分布式内存架构上的紧凑带状线性系统。所提出的方法将原始系统分解为两级内存层次结构,并在分布式和共享内存上使用并行循环缩减来解决它。与涉及数据转置或重新分区的传统算法相比,这种方法在分布式内存分区上的通信占用量较低。在这项工作中开发的算法被推广到具有灵活数据分解的循环紧凑带状系统。对于循环紧凑带状系统,该方法是一种直接求解器,具有确定性操作和通信计数,具体取决于矩阵大小、带宽和分区策略。讨论了实现和运行时配置细节以进行性能优化。可扩展性在线性求解器以及具有代表性的流体力学应用问题上得到了证明,其中主要的计算成本是在 3D 周期域上求解紧凑数值格式的循环三对角线性系统。该算法特别适用于求解由于将紧凑型有限差分算子应用于广泛的偏微分方程问题而产生的线性系统,例如但不限于可压缩湍流、气动声学、弹塑性波传播的数值模拟, 和电磁学。它减轻了它们在现代高性能计算硬件上使用的障碍,

更新日期:2022-07-19
down
wechat
bug