Semi-Lagrangian Vlasov simulation on GPUs,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semi-Lagrangian Vlasov simulation on GPUs
arXiv - CS - Mathematical Software Pub Date : 2019-07-18 , DOI: arxiv-1907.08316
Lukas Einkemmer

In this paper, our goal is to efficiently solve the Vlasov equation on GPUs. A semi-Lagrangian discontinuous Galerkin scheme is used for the discretization. Such kinetic computations are extremely expensive due to the high-dimensional phase space. The SLDG code, which is publicly available under the MIT license abstracts the number of dimensions and uses a shared codebase for both GPU and CPU based simulations. We investigate the performance of the implementation on a range of both Tesla (V100, Titan V, K80) and consumer (GTX 1080 Ti) GPUs. Our implementation is typically able to achieve a performance of approximately 470 GB/s on a single GPU and 1600 GB/s on four V100 GPUs connected via NVLink. This results in a speedup of about a factor of ten (comparing a single GPU with a dual socket Intel Xeon Gold node) and approximately a factor of 35 (comparing a single node with and without GPUs). In addition, we investigate the effect of single precision computation on the performance of the SLDG code and demonstrate that a template based dimension independent implementation can achieve good performance regardless of the dimensionality of the problem.

中文翻译：

GPU 上的半拉格朗日 Vlasov 模拟

在本文中，我们的目标是在 GPU 上有效地求解 Vlasov 方程。半拉格朗日不连续伽辽金方案用于离散化。由于高维相空间，这种动力学计算非常昂贵。SLDG 代码在 MIT 许可下公开提供，抽象了维度数量，并使用共享代码库进行基于 GPU 和 CPU 的模拟。我们调查了在一系列 Tesla（V100、Titan V、K80）和消费者（GTX 1080 Ti）GPU 上的实现性能。我们的实施通常能够在单个 GPU 上实现大约 470 GB/s 的性能，在通过 NVLink 连接的四个 V100 GPU 上实现大约 1600 GB/s 的性能。这导致大约 10 倍的加速（将单个 GPU 与双插槽 Intel Xeon Gold 节点进行比较）和大约 35 倍的加速（比较有和没有 GPU 的单个节点）。此外，我们研究了单精度计算对 SLDG 代码性能的影响，并证明无论问题的维度如何，基于模板的维度无关实现都可以获得良好的性能。

更新日期：2020-06-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文