Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster
arXiv - CS - Mathematical Software Pub Date : 2021-04-02 , DOI: arxiv-2104.01196
Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs.

中文翻译：

两阶段高斯-适用于GPU群集的Krylov解算器的Seeidel预处理器和平滑器

高斯-赛德尔（GS）松弛通常用作Krylov求解器的前置条件或代数多重网格（AMG）的平滑器。但是，必需的稀疏三角求解很难在诸如图形处理单元（GPU）之类的多核体系结构上并行化。在本研究中，将基于三角形解的传统GS松弛性能与两阶段变体进行了比较，用固定数量的内部Jacobi-Richardson（JR）迭代代替了直接三角形解。当少量内部迭代足以维持Krylov收敛速度时，两阶段GS（GS2）通常在多核体系结构上优于传统算法。我们还将GS2与JR进行了比较。当他们为SpMV执行相同数量的触发器时（例如与两次GS扫描和一个内部JR扫描相比，三个JR扫描），GS2迭代和经过GS2预处理的Krylov解算器的收敛速度可能比JR迭代快。此外，对于某些问题（例如弹性），我们发现JR可能会以1的阻尼因子发散，而两阶段GS可能会通过更多的内部迭代来改善收敛性。最后，为了研究实际问题的两阶段平滑器和预处理器的性能，例如，％（例如，使用调整的阻尼因子），将它们应用于GPU上的不可压缩流体流动仿真。而两阶段GS可以通过更多内部迭代来提高收敛性。最后，为了研究实际问题的两阶段平滑器和预处理器的性能，例如，％（例如，使用调整的阻尼因子），将它们应用于GPU上的不可压缩流体流动仿真。而两阶段GS可以通过更多内部迭代来提高收敛性。最后，为了研究实际问题的两阶段平滑器和预处理器的性能，例如，％（例如，使用调整的阻尼因子），将它们应用于GPU上的不可压缩流体流动仿真。

更新日期：2021-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>