Point-block incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems,The International Journal of High Performance Computing Applications

当前位置： X-MOL 学术 › Int. J. High Perform. Comput. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Point-block incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems
The International Journal of High Performance Computing Applications ( IF 3.1 ) Pub Date : 2020-12-28 , DOI: 10.1177/1094342020981153
Wenpeng Ma ₁ , Xiao-Chuan Cai ₂

Affiliation

Point-block matrices arise naturally in multiphysics problems when all variables associated with a mesh point are ordered together, and are different from the general block matrices since the sizes of the blocks are so small one can often invert some of the diagonal blocks explicitly. Motivated by the recent works of Chow and Patel and Chow et al., we propose an efficient incomplete LU (ILU) preconditioner for point-block matrices targeting applications on GPU. The construction of the preconditioner involves two critical steps: (1) the initial guessing of values for the lower and upper triangular matrices; and (2) several sweeps of asynchronous updating of the triangular matrices. Three representative problems are studied to show the advantage of the proposed point-block approach over the standard point-wise approach in terms of the number of GMRES iterations and also the total compute time. Moreover, we compare the proposed algorithm with the level-scheduling based parallel algorithm employed in NVIDIA’s cuSPARSE library as well as the serial method implemented in Intel MKL library, and the experiments show that a 2×–5× speedup can be achieved over the block-based ILU(p) factorizations from the cuSPARSE library.

中文翻译：

点块不完全LU预处理与GPU上的异步迭代，解决了多物理场问题

当与网格点关联的所有变量一起排序时，点块矩阵自然会在多物理场问题中出现，并且与常规块矩阵不同，这是因为块的大小非常小，通常可以明确地反转一些对角线块。基于Chow和Patel和Chow等人的最新工作，我们提出了一种针对GPU上的点块矩阵的高效不完整LU（ILU）预调节器。预处理器的构建涉及两个关键步骤：（1）对上下三角形矩阵的值进行初始猜测；（2）三角矩阵异步更新的几次扫描。研究了三个有代表性的问题，以显示在GMRES迭代次数以及总计算时间方面，所提出的点块方法相对于标准的逐点方法的优势。此外，我们将提出的算法与NVIDIA cuSPARSE库中采用的基于级别调度的并行算法以及英特尔MKL库中实现的串行方法进行了比较，实验表明，在该块上可以实现2倍至5倍的加速基于ILU（p）来自cuSPARSE库的分解。

更新日期：2020-12-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>