当前位置: X-MOL 学术Math. Probl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices
Mathematical Problems in Engineering Pub Date : 2021-02-28 , DOI: 10.1155/2021/6804723
Wenpeng Ma 1 , Yiwen Hu 1 , Wu Yuan 2 , Xiazhen Liu 2
Affiliation  

Solving triangular systems is the building block for preconditioned GMRES algorithm. Inexact preconditioning becomes attractive because of the feature of high parallelism on accelerators. In this paper, we propose and implement an iterative, inexact block triangular solve on multi-GPUs based on PETSc’s framework. In addition, by developing a distributed block sparse matrix-vector multiplication procedure and investigating the optimized vector operations, we form the multi-GPU-enabled preconditioned GMRES with the block Jacobi preconditioner. In the implementation, the GPU-Direct technique is employed to avoid host-device memory copies. The preconditioning step used by PETSc’s structure and the cuSPARSE library are also investigated for performance comparisons. The experiments show that the developed GMRES with inexact preconditioning on 8 GPUs can achieve up to 4.4x speedup over the CPU-only implementation with exact preconditioning using 8 MPI processes.

中文翻译:

针对块稀疏矩阵开发具有不精确三角形解决方案的启用多GPU的预处理GMRES

解决三角系统是预处理GMRES算法的基础。由于加速器具有高度并行性,因此不精确的预处理变得很有吸引力。在本文中,我们提出并基于PETSc的框架在多GPU上实现了迭代的,不精确的块三角求解。此外,通过开发分布式块稀疏矩阵向量乘法程序并研究优化的向量运算,我们与块Jacobi预处理器一起形成了支持多GPU的预处理GMRES。在实现中,采用GPU-Direct技术来避免主机设备内存副本。还对PETSc的结构和cuSPARSE库使用的预处理步骤进行了性能比较。
更新日期:2021-02-28
down
wechat
bug