当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STREAmS: A high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flows
Computer Physics Communications ( IF 7.2 ) Pub Date : 2021-02-20 , DOI: 10.1016/j.cpc.2021.107906
Matteo Bernardini , Davide Modesti , Francesco Salvadore , Sergio Pirozzoli

We present STREAmS, an in-house high-fidelity solver for direct numerical simulations (DNS) of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interaction. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. From the computational viewpoint, STREAmS is oriented to modern HPC platforms thanks to MPI parallelization and the ability to run on multi-GPU architectures. This paper discusses the main implementation strategies, with particular reference to the CUDA paradigm, the management of a single code for traditional and multi-GPU architectures, and the optimization process to take advantage of the latest generation of NVIDIA GPUs. Performance measurements show that single-GPU optimization more than halves the computing time as compared to the baseline version. At the same time, the asynchronous patterns implemented in STREAmS for MPI communications guarantee very good parallel performance especially in the weak scaling spirit, with efficiency exceeding 97% on 1024 GPUs. For overall evaluation of STREAmS with respect to other compressible solvers, comparison with a recent GPU-enabled community solver is presented. It turns out that, although STREAmS is much more limited in terms of flow configurations that can be addressed, the advantage in terms of accuracy, computing time and memory occupation is substantial, which makes it an ideal candidate for large-scale simulations of high-Reynolds number, compressible wall-bounded turbulent flows. The solver is released open source under GPLv3 license.

Program summary

Program Title: STREAmS

CPC Library link to program files: https://doi.org/10.17632/hdcgjpzr3y.1

Developer’s repository link: https://github.com/matteobernardini/STREAmS

Code Ocean capsule: https://codeocean.com/capsule/8931507/tree/v2

Licensing provisions: GPLv3

Programming language: Fortran 90, CUDA Fortran, MPI

Nature of problem: Solving the three-dimensional compressible Navier–Stokes equations for low and high Mach regimes in a Cartesian domain configured for channel, boundary layer or shock-boundary layer interaction flows.

Solution method: The convective terms are discretized using a hybrid energy-conservative shock-capturing scheme in locally conservative form. Shock-capturing capabilities rely on the use of Lax–Friedrichs flux vector splitting and weighted essentially non-oscillatory (WENO) reconstruction. The system is advanced in time using a three-stage, third-order RK scheme. Two-dimensional pencil distributed MPI parallelization is implemented alongside different patterns of GPU (CUDA Fortran) accelerated routines.



中文翻译:

STREAmS:一种高保真加速求解器,用于可压缩湍流的直接数值模拟

我们介绍了STREAmS,这是一种内部高保真求解器,可用于对可压缩的典型壁边界流进行直接数值模拟(DNS),即湍流平面通道,零压力梯度湍流边界层和超音速斜向冲击波/边界层相互作用。该求解器结合了最新的数值算法,专门设计用于解决与高速湍流解决方案相关的挑战性问题,并且可以在从低亚音速到最大亚音速的广泛马赫数范围内使用。高超音速状态。从计算的角度来看,借助MPI,STREAmS面向现代HPC平台并行化以及在多GPU架构上运行的能力。本文讨论了主要的实现策略,尤其涉及CUDA范例,针对传统和多GPU架构的单一代码的管理以及利用最新一代NVIDIA GPU的优化过程。性能测量表明,与基准版本相比,单GPU优化将计算时间减少了一半以上。同时,在STREAmS中为MPI通信实现的异步模式可确保非常好的并行性能,尤其是在弱缩放精神方面,在1024个GPU上的效率超过97%。为了相对于其他可压缩求解器对STREAmS进行总体评估,提出了与最近启用GPU的社区求解器的比较。事实证明,尽管STREAmS在可解决的流配置方面受到更多限制,但在准确性,计算时间和内存占用方面的优势是巨大的,这使其成为大规模模拟高雷诺数,可压缩壁的理想选择界湍流。该求解器是在GPLv3许可下开源发布的。

计划摘要

节目名称: STREAmS

CPC库链接到程序文件: https : //doi.org/10.17632/hdcgjpzr3y.1

开发人员的资料库链接: https : //github.com/matteobernardini/STREAmS

Code Ocean太空舱: https : //codeocean.com/capsule/8931507/tree/v2

许可条款: GPLv3

编程语言:Fortran 90,CUDA Fortran,MPI

问题的性质:在为通道,边界层或冲击边界层相互作用流配置的笛卡尔域中,求解低马赫状态和高马赫状态的三维可压缩Navier-Stokes方程。

解决方法:对流项使用局部保守形式的混合节能减震方案离散化。震荡捕获能力依赖于使用Lax–Friedrichs通量矢量分裂和加权基本非振荡(WENO)重建。该系统使用三阶段三阶RK方案在时间上进行了改进。二维铅笔分布式MPI并行化与GPU(CUDA Fortran)加速例程的不同模式一起实现。

更新日期:2021-03-05
down
wechat
bug