当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
K-Athena: a performance portable structured grid finite volume magnetohydrodynamics code
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2021-01-01 , DOI: 10.1109/tpds.2020.3010016
Philipp Grete 1 , Forrest W. Glines 1 , Brian W. O'Shea 1
Affiliation  

Large scale simulations are a key pillar of modern research and require ever-increasing computational resources. Different novel manycore architectures have emerged in recent years on the way towards the exascale era. Performance portability is required to prevent repeated non-trivial refactoring of a code for different architectures. We combine Athena++, an existing magnetohydrodynamics (MHD) CPU code, with Kokkos, a performance portable on-node parallel programming paradigm, into K-Athena to allow efficient simulations on multiple architectures using a single codebase. We present profiling and scaling results for different platforms including Intel Skylake CPUs, Intel Xeon Phis, and NVIDIA GPUs. K-Athena achieves $>10^8$>108 cell-updates/s on a single V100 GPU for second-order double precision MHD calculations, and a speedup of 30 on up to 24 576 GPUs on Summit (compared to 172,032 CPU cores), reaching $1.94\times 10^{12}$1.94×1012 total cell-updates/s at 76 percent parallel efficiency. Using a roofline analysis we demonstrate that the overall performance is currently limited by DRAM bandwidth and calculate a performance portability metric of 62.8 percent. Finally, we present the implementation strategies used and the challenges encountered in maximizing performance. This will provide other research groups with a straightforward approach to prepare their own codes for the exascale era. K-Athena is available at https://gitlab.com/pgrete/kathena.

中文翻译:

K-Athena:高性能便携式结构网格有限体积磁流体动力学代码

大规模模拟是现代研究的关键支柱,需要不断增加的计算资源。近年来,在迈向百亿亿次时代的道路上出现了不同的新颖多核架构。需要性能可移植性,以防止针对不同架构对代码进行重复的非平凡重构。我们结合雅典娜++,现有的磁流体动力学 (MHD) CPU 代码,具有 科科斯,一种性能可移植的节点上并行编程范例,进入 K-雅典娜允许使用单个代码库在多个架构上进行有效模拟。我们展示了不同平台的分析和扩展结果,包括 Intel Skylake CPU、Intel Xeon Phis 和 NVIDIA GPU。K-雅典娜 达到 $>10^8$>108 单个 V100 GPU 上的 cell-updates/s 用于二阶双精度 MHD 计算,并且在 Summit 上最多 24 576 个 GPU 上加速了 30(与 172,032 个 CPU 内核相比),达到 $1.94\乘以 10^{12}$1.94×1012并行效率为 76% 时的总单元更新数/秒。使用屋顶线分析,我们证明整体性能目前受到 DRAM 带宽的限制,并计算出 62.8% 的性能可移植性指标。最后,我们介绍了使用的实施策略以及在最大化性能方面遇到的挑战。这将为其他研究小组提供一种直接的方法来为百亿亿次时代准备自己的代码。K-雅典娜 可在 https://gitlab.com/pgrete/kathena.
更新日期:2021-01-01
down
wechat
bug