当前位置: X-MOL 学术Comput. Geosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPU acceleration of MPAS microphysics WSM6 using OpenACC directives: Performance and verification
Computers & Geosciences ( IF 4.4 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.cageo.2020.104627
Jae Youp Kim , Ji-Sun Kang , Minsu Joh

Abstract We have attempted to accelerate a microphysics scheme embedded within a next generation climate/weather numerical model, the Model for Prediction Across Scales (MPAS), using OpenACC directives. As one of the most time-consuming physics parameterization schemes, we have focused on parallelizing the Weather Research and Forecasting (WRF) single-moment 6-class microphysics scheme (WSM6) onto a Graphics Processing Unit (GPU). We applied several essential methodologies to optimize the performance of WSM6 computation on GPU, so as to minimize data transfer between the Central Processing Unit (CPU) and GPU, and to reduce the waste of GPU threads during computation. As a result, we achieved GPU runs using one Tesla V100 that were on average 4.29 times faster than 20 CPU core Message Passing Interface (MPI) runs, including I/O communication between the CPU and GPU. When porting the whole model onto the GPU, then we achieved x10.44 speedup of WSM6 computation, allowing us to measure the acceleration of WSM6 without I/O communication. In addition, we developed a precise verification method to distinguish nonlinear chaotic error growth from differences introduced by GPU computation, taking account of the characteristics of the major output variables from WSM6. For a fair comparison, we compared the difference between CPU and GPU runs to the difference between CPU runs with different compilers. We also examined bias in these differences, which can distort the climatology of model simulation. Here, we have shown that our approach successfully passed our verification process. This represents the first successful application of GPU acceleration to the realistic full-model integration of MPAS.

中文翻译:

使用 OpenACC 指令对 MPAS 微物理 WSM6 进行 GPU 加速:性能和验证

摘要 我们已尝试使用 OpenACC 指令加速嵌入在下一代气候/天气数值模型中的微物理学方案,即跨尺度预测模型 (MPAS)。作为最耗时的物理参数化方案之一,我们专注于将天气研究和预测 (WRF) 单时刻 6 类微物理方案 (WSM6) 并行化到图形处理单元 (GPU) 上。我们应用了几种基本的方法来优化 GPU 上 WSM6 计算的性能,从而最大限度地减少中央处理单元 (CPU) 和 GPU 之间的数据传输,并减少计算过程中 GPU 线程的浪费。因此,我们使用一台 Tesla V100 实现了 GPU 运行,平均比 20 个 CPU 内核消息传递接口 (MPI) 运行快 4.29 倍,包括 CPU 和 GPU 之间的 I/O 通信。当将整个模型移植到 GPU 上时,我们实现了 WSM6 计算的 x10.44 加速,允许我们在没有 I/O 通信的情况下测量 WSM6 的加速。此外,我们开发了一种精确的验证方法,以区分非线性混沌误差增长和 GPU 计算引入的差异,同时考虑到 WSM6 的主要输出变量的特征。为了公平比较,我们将 CPU 和 GPU 运行之间的差异与 CPU 在不同编译器下运行之间的差异进行了比较。我们还检查了这些差异中的偏差,这会扭曲模型模拟的气候学。在这里,我们已经证明我们的方法成功地通过了我们的验证过程。
更新日期:2021-01-01
down
wechat
bug