当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating Geometric Multigrid Preconditioning with Half-Precision Arithmetic on GPUs
arXiv - CS - Mathematical Software Pub Date : 2020-07-15 , DOI: arxiv-2007.07539
Kyaw L. Oo, Andreas Vogel

With the hardware support for half-precision arithmetic on NVIDIA V100 GPUs, high-performance computing applications can benefit from lower precision at appropriate spots to speed up the overall execution time. In this paper, we investigate a mixed-precision geometric multigrid method to solve large sparse systems of equations stemming from discretization of elliptic PDEs. While the final solution is always computed with high-precision accuracy, an iterative refinement approach with multigrid preconditioning in lower precision and residuum scaling is employed. We compare the FP64 baseline for Poisson's equation to purely FP16 multigrid preconditioning and to the employment of FP16-FP32-FP64 combinations within a mesh hierarchy. While the iteration count is almost not affected by using lower accuracy, the solver runtime is considerably decreased due to the reduced memory transfer and a speedup of up to 2.5x is gained for the overall solver. We investigate the performance of selected kernels with the hierarchical Roofline model.

中文翻译:

在 GPU 上使用半精度算法加速几何多重网格预处理

借助 NVIDIA V100 GPU 上对半精度算法的硬件支持,高性能计算应用程序可以在适当的位置受益于较低的精度,从而加快整体执行时间。在本文中,我们研究了一种混合精度几何多重网格方法,以求解源自椭圆偏微分方程离散化的大型稀疏方程组。虽然最终的解决方案总是以高精度计算,但采用了具有较低精度和残差缩放的多重网格预处理的迭代细化方法。我们将泊松方程的 FP64 基线与纯粹的 FP16 多重网格预处理以及在网格层次结构中使用 FP16-FP32-FP64 组合进行比较。虽然迭代次数几乎不受使用较低精度的影响,由于减少了内存传输,求解器运行时间显着减少,并且整个求解器的速度提高了 2.5 倍。我们使用分层 Roofline 模型研究选定内核的性能。
更新日期:2020-07-16
down
wechat
bug