当前位置: X-MOL 学术Numer. Linear Algebra Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Block low-rank single precision coarse grid solvers for extreme scale multigrid methods
Numerical Linear Algebra with Applications ( IF 1.8 ) Pub Date : 2021-08-12 , DOI: 10.1002/nla.2407
Alfredo Buttari 1 , Markus Huber 2 , Philippe Leleux 3 , Theo Mary 4 , Ulrich Rüde 3, 5 , Barbara Wohlmuth 2
Affiliation  

Extreme scale simulation requires fast and scalable algorithms, such as multigrid methods. To achieve asymptotically optimal complexity, it is essential to employ a hierarchy of grids. The cost to solve the coarsest grid system can often be neglected in sequential computings, but cannot be ignored in massively parallel executions. In this case, the coarsest grid can be large and its efficient solution becomes a challenging task. We propose solving the coarse grid system using modern, approximate sparse direct methods and investigate the expected gains compared with traditional iterative methods. Since the coarse grid system only requires an approximate solution, we show that we can leverage block low-rank techniques, combined with the use of single precision arithmetic, to significantly reduce the computational requirements of the direct solver. In the case of extreme scale computing, the coarse grid system is too large for a sequential solution, but too small to permit massively parallel efficiency. We show that the agglomeration of the coarse grid system to a subset of processors is necessary for the sparse direct solver to achieve performance. We demonstrate the efficiency of the proposed method on a Stokes-type saddle point system solved with a monolithic Uzawa multigrid method. In particular, we show that the use of an approximate sparse direct solver for the coarse grid system can outperform that of a preconditioned minimal residual iterative method. This is demonstrated for the multigrid solution of systems of order up to 1 0 11 degrees of freedom on a petascale supercomputer using 43,200 processes.

中文翻译:

用于极尺度多重网格方法的块低秩单精度粗网格求解器

极大规模模拟需要快速且可扩展的算法,例如多重网格方法。为了实现渐近最优的复杂性,必须采用网格层次结构。求解最粗网格系统的成本在顺序计算中往往可以忽略,但在大规模并行执行中则不能忽略。在这种情况下,最粗的网格可能很大,其有效的解决方案成为一项具有挑战性的任务。我们建议使用现代的近似稀疏直接方法求解粗网格系统,并研究与传统迭代方法相比的预期收益。由于粗网格系统只需要一个近似解,我们表明我们可以利用块低秩技术,结合单精度算法的使用,显着降低直接求解器的计算要求。在极大规模计算的情况下,粗网格系统对于顺序解决方案来说太大了,但又太小而无法实现大规模并行效率。我们表明,将粗网格系统聚集到处理器子集是稀疏直接求解器实现性能所必需的。我们证明了所提出的方法在使用整体 Uzawa 多重网格方法求解的斯托克斯型鞍点系统上的效率。特别是,我们表明对粗网格系统使用近似稀疏直接求解器可以胜过预处理的最小残差迭代方法。这在阶数为 我们表明,将粗网格系统聚集到处理器子集是稀疏直接求解器实现性能所必需的。我们证明了所提出的方法在使用整体 Uzawa 多重网格方法求解的斯托克斯型鞍点系统上的效率。特别是,我们表明对粗网格系统使用近似稀疏直接求解器可以胜过预处理的最小残差迭代方法。这在阶数为 我们表明,将粗网格系统聚集到处理器子集是稀疏直接求解器实现性能所必需的。我们证明了所提出的方法在使用整体 Uzawa 多重网格方法求解的斯托克斯型鞍点系统上的效率。特别是,我们表明对粗网格系统使用近似稀疏直接求解器可以胜过预处理的最小残差迭代方法。这在阶数为 我们表明,对粗网格系统使用近似稀疏直接求解器可以胜过预处理的最小残差迭代方法。这在阶数为 我们表明,对粗网格系统使用近似稀疏直接求解器可以胜过预处理的最小残差迭代方法。这在阶数为 1 0 11 使用 43,200 个进程的 petascale 超级计算机上的自由度。
更新日期:2021-08-12
down
wechat
bug