当前位置: X-MOL 学术J. Mech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of a Parallel Explicit Finite-Volume Euler Equation Solver using the Immersed Boundary Method with Hybrid MPI-CUDA Paradigm
Journal of Mechanics ( IF 1.5 ) Pub Date : 2019-10-11 , DOI: 10.1017/jmech.2019.9
F. A. Kuo , C. H. Chiang , M. C. Lo , J. S. Wu

This study proposed the application of a novel immersed boundary method (IBM) for the treatment of irregular geometries using Cartesian computational grids for high speed compressible gas flows modelled using the unsteady Euler equations. Furthermore, the method is accelerated through the use of multiple Graphics Processing Units – specifically using Nvidia’s CUDA together with MPI - due to the computationally intensive nature associated with the numerical solution to multi-dimensional continuity equations. Due to the high degree of locality required for efficient multiple GPU computation, the Split Harten-Lax-van-Leer (SHLL) scheme is employed for vector splitting of fluxes across cell interfaces. NVIDIA visual profiler shows that our proposed method having a computational speed of 98.6 GFLOPS and 61% efficiency based on the Roofline analysis that provides the theoretical computing speed of reaching 160 GLOPS with an average 2.225 operations/byte. To demonstrate the validity of the method, results from several benchmark problems covering both subsonic and supersonic flow regimes are presented. Performance testing using 96 GPU devices demonstrates a speed up of 89 times that of a single GPU (i.e. 92% efficiency) for a benchmark problem employing 48 million cells. Discussions regarding communication overhead and parallel efficiency for varying problem sizes are also presented.

中文翻译:

使用浸入边界法和混合 MPI-CUDA 范式开发并行显式有限体积欧拉方程求解器

这项研究提出了一种新的浸入边界方法 (IBM) 的应用,用于处理使用笛卡尔计算网格的不规则几何形状,用于使用非定常欧拉方程建模的高速可压缩气流。此外,由于与多维连续性方程的数值解相关的计算密集型性质,该方法通过使用多个图形处理单元(特别是将 Nvidia 的 CUDA 与 MPI 一起使用)加速。由于高效的多 GPU 计算需要高度的局部性,因此采用 Split Harten-Lax-van-Leer (SHLL) 方案对跨单元界面的通量进行矢量分裂。NVIDIA 视觉分析器显示我们提出的方法具有 98 的计算速度。6 GFLOPS 和 61% 的效率基于 Roofline 分析,提供达到 160 GLOPS 的理论计算速度,平均 2.225 操作/字节。为了证明该方法的有效性,给出了涵盖亚音速和超音速流态的几个基准问题的结果。使用 96 个 GPU 设备进行的性能测试表明,对于使用 4800 万个单元的基准问题,速度是单个 GPU 的 89 倍(即效率为 92%)。还介绍了有关不同问题规模的通信开销和并行效率的讨论。使用 96 个 GPU 设备进行的性能测试表明,对于使用 4800 万个单元的基准问题,速度是单个 GPU 的 89 倍(即效率为 92%)。还介绍了有关不同问题规模的通信开销和并行效率的讨论。使用 96 个 GPU 设备进行的性能测试表明,对于使用 4800 万个单元的基准问题,速度是单个 GPU 的 89 倍(即效率为 92%)。还介绍了有关不同问题规模的通信开销和并行效率的讨论。
更新日期:2019-10-11
down
wechat
bug