Elsevier

Computers & Fluids

Volume 215, 30 January 2021, 104787
Computers & Fluids

Direct numerical simulations of turbulent reacting flows with shock waves and stiff chemistry using many-core/GPU acceleration

https://doi.org/10.1016/j.compfluid.2020.104787Get rights and content

Highlights

  • Performance portability of a new reacting flow DNS solver, KARFS, is demonstrated.

  • Effects of different metrics on elucidating performance is highlighted.

  • DNS of detonation with stiff chemistry is studied under non-trivial conditions.

  • Potential of using WENO schemes in reacting flow DNS without shocks is presented.

Abstract

Compressible reacting flows may display sharp spatial variation related to shocks, contact discontinuities or reactive zones embedded within relatively smooth regions. The presence of such phenomena emphasizes the relevance of shock-capturing schemes such as the weighted essentially non-oscillatory (WENO) scheme as an essential ingredient of the numerical solver. However, these schemes are complex and have more computational cost than the simple high-order compact or non-compact schemes. In this paper, we present the implementation of a seventh-order, minimally-dissipative mapped WENO (WENO7M) scheme in a newly developed direct numerical simulation (DNS) code called KAUST Adaptive Reactive Flows Solver (KARFS). In order to make efficient use of the computer resources and reduce the solution time, without compromising the resolution requirement, the WENO routines are accelerated via graphics processing unit (GPU) computation. The performance characteristics and scalability of the code are studied using different grid sizes and block decomposition. The performance portability of KARFS is demonstrated on a variety of architectures including NVIDIA Tesla P100 GPUs and NVIDIA Kepler K20X GPUs. In addition, the capability and potential of the newly implemented WENO7M scheme in KARFS to perform DNS of compressible flows is also demonstrated with model problems involving shocks, isotropic turbulence, detonations and flame propagation into a stratified mixture with complex chemical kinetics.

Introduction

The pursuit of improved efficiency and lower emissions in internal combustion engines has been an ongoing quest for several decades and continues to attract significant attention due to its economic and environmental impact. Recent emphasis on reducing the harmful nitrogen oxide and greenhouse gases emissions has renewed the focus on developing better internal combustion (IC) technologies that can derive the full economic potential of fossil fuels for transportation applications while lowering emissions. Several advanced combustion concepts have emerged which hold the promise of simultaneously improving the efficiency and lowering the emissions of IC engines, while also allowing the utilization of renewable fuels [1], [2], [3]. The combustion concepts that are being pursued are quite different from the traditional spark ignition (SI) or compression ignition (CI) based engines. Ignition and combustion are expected to occur at conditions that are in the fringes of the traditional operating regimes of IC engines with limited prior understanding and experience. Furthermore, mixed modes of combustion have been observed to occur in which propagation of reaction fronts and uniform volumetric combustion exist simultaneously ([4], [5], [6], [7], [8], [9], [10] and references therein). There is currently only a limited fundamental understanding of the nature of combustion in an auto-ignitive mixture that is conducive to both spontaneous ignition and diffusion driven flame propagation. Several previous experimental and numerical studies have focused on characterizing the nature of combustion and delineating it into multiple regimes based on the propagation speed of the autoignition front and also based on the impact of turbulence and stratification on such a front [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. Fundamental numerical simulations that can capture the nature and characteristics of auto-ignition in stratified reactive mixtures are key to improving and applying the appropriate combustion models in device-scale simulations. As such, modeling and simulation will continue to play a very important role in improving the fundamental understanding of the combustion characteristics at such conditions, and in advancing the combustion technologies for practical applications.

Direct numerical simulation (DNS) of turbulent reacting flows is a first principle approach for turbulent combustion simulations that does not require closure models for turbulence physics and therefore free of underlying statistical and numerical assumptions [22]. DNS of reacting flows in canonical flow configurations such as triply periodic cube domains, quasi-one dimensional planar stabilized and fully three-dimensional spatially developing jet configurations have been used to study turbulence-chemistry interaction in reactive mixtures [4], [5], [23]. Since DNS resolves all of the relevant flow scales up to the smallest dissipative scales, it becomes prohibitively expensive to simulate large scale flow motion up to device scales. However, smaller canonical geometries have been within the reach of DNS for several years now. Newer developments in high performance computing (HPC) have allowed an increase in the range of length scales simulated, providing more realism in the turbulence physics, along with simultaneously increasing the chemical complexity, allowing the simulation of fuels and intermediates that are better surrogates of transportation fuels.

One of the primary objectives of DNS is to capture the turbulent flow physics as accurately as possible without introducing numerical distortion that would cause the turbulence energy cascade to be incorrect. Low dissipation by employing higher order accuracy have become essential for the fidelity of simulations. Higher order centered finite difference schemes [24] along with higher order accurate Runge–Kutta time integration schemes [25] have been very successful in enabling DNS with higher solution accuracy while also allowing efficient and affordable computations of large problems. Since the central difference scheme lacks the numerical dissipation needed to ensure stability, it is also customary to use a filter operator, such as the tenth order accurate explicit filter by Kennedy and Carpenter [24]. Such a combination of higher order accurate central difference schemes along with explicit filtering has proven to be well suited and sufficient for DNS of low speed turbulent combustion when the combustion fronts are sufficiently resolved on the numerical grid.

Despite their advantages, centered finite difference schemes are not suitable for performing DNS of supersonic combustion. Under normal conditions, the spatial profile of a propagating reaction front is determined by the transport rate of heat and radicals between the reaction zone and the upstream mixture. The thickness of a typical transport-driven flame front is determined by the molecular or turbulent diffusivity, in the case of laminar and turbulent flames, respectively. However, a spontaneous propagation front under auto-ignitive conditions is not limited by the transport rates. The thickness of a spontaneous propagation front is determined primarily by the gradients in reactivity of the upstream mixture and the effects of transport rate are secondary. When a spontaneously propagating front encounters an adverse gradient in reactivity, it can turn into a sharp contact discontinuity between the unreacted and reacted mixtures, with steep gradients in density, temperature and composition. Such a contact discontinuity is distinct from a detonation, since there is no large pressure jump between the unburned and burned mixtures. The contact discontinuity can only be resolved by the central difference schemes seldomly and often requires the use of a numerical scheme capable of capturing discontinuities. Similarly, a detonation may occur when a spontaneously propagating front acheives resonance with the accompanying pressurc wave, thereby rendering the standard centered finite difference schemes ineffective in capturing the underlying physics.

Weighted essentially non-oscillatory (WENO) schemes [26] are a class of numerical methods that have been shown to be capable of capturing discontinuities in compressible flows, while also being suitable for DNS. WENO schemes obtain a numerical approximation to the advective fluxes in the governing equation using a weighted sum of fluxes computed using multiple candidate stencils. The weights are determined by adapting to the presence of discontinuities such that the solution tends to approach high accuracy and low dissipation in smooth regions of the flow. WENO schemes are considerably more expensive than regular central difference schemes due to the need to evaluate the same flux functions from multiple candidate stencils [27]. However, the numerical method is suitable for implementation in finite difference codes using stencil operations very similar to the central difference schemes. Also, the usual advantages of DNS using central difference schemes such as parallelism and scalability carries over to the WENO schemes as well. Apart from DNS, these advantages of WENO schemes have also been demonstrated in implicit large eddy simulations (iLES) [28], [29], [30].

Success in the general application of computational fluid dynamics (CFD) and affiliated techniques such as DNS is largely dependent on achieving an optimal balance between multi-scale fidelity, multi-physics detail and the overall computational cost. In this regard, recent advances in computational architectures offer both potential and challenges. Dramatic speed-ups can be potentially achieved across platforms using state-of-the-art multi-core central processing units (CPUs) and graphics processing units (GPUs). The challenge, however, is actually attaining this potential through development of new parallel programming models that can be used to efficiently port various algorithms and the related code kernels to these new heterogeneous architectures. A major challenge in utilizing heterogeneous resources is the diversity of devices on different machines, which provide widely varying performance characteristics. A program or algorithm optimized for one architecture may not run as well on the next generation of processors or on a device from a different vendor. A program or algorithm optimized for GPU execution is often very different from the one optimized for CPU execution. The relative performance of CPUs and GPUs also varies between machines. On one machine, a specific portion of a computation may run best on the CPU, while on another machine it may run best on the GPU. In some cases, it is best to balance the workload between the CPU and the GPU; in other cases, it may be best to execute an algorithm on a device where it runs more slowly but closer to where its output is needed, in order to avoid expensive data transfer operations. It is not uncommon for large application codes to have several different implementations, with each one optimized for a different architecture. This software development approach leads to a code maintainability issue: every new change to the code needs to be implemented in all versions of the code. Performance portability of a single code base, therefore, has become a critical issue: the parallel code needs to execute correctly and remain performant on machines with different architectures, operating systems, and software libraries.

Several groups have published their efforts for accelerating the flow solvers used for multi-component reacting flow calculations using GPUs. The most common approach is to select the most time consuming and compute-intensive kernels and offload them to the GPUs for accelerated solution [31]. In the case of combustion solvers, the chemical reaction kinetics remains the predominant kernel which is routinely offloaded to the GPUs. Spafford and co-workers [32] were the first to use this approach to offload the chemical kinetics evaluation to a GPU using the CUDA programming model and thereby accelerate a DNS solver for turbulent combustion. Their approach was to use grid-level parallelism for acceleration by computing the reaction kinetics across the large number of grid points in parallel on the GPU. In contrast, Shi et al. [33], utilized the parallelism available in the reaction network itself to simultaneously calculate all the reaction rates for a single kinetic system. They observed a considerable speedup for large chemical reaction mechanisms (>1000 species), but could not obtain any acceleration for smaller mechanisms (<100 species). The performance of implicit integration schemes for stiff chemical kinetics against the speedups obtained for explicit integration has also been measured by Stone et al. [34]. More recently, Niemeyer et al. [35] used fully explicit and stabilized explicit methods for accelerating chemical kinetics with low to moderate levels of stiffness. In these studies, it was found that the potential for acceleration of chemical kinetics through GPUs is large when explicit integration is feasible. But in the presence of stiffness, the performance on GPUs was highly susceptible to thread divergence due to varying levels of stiffness across the multiple states.

We introduced a new DNS solver - KARFS (KAUST Adaptive Reacting Flows Solver) for combustion calculations in a previous research article [36], showcasing its application in multidimensional turbulent reacting flow simulations. KARFS is a modern DNS code developed in C++ using modern parallel programming patterns, distributed memory parallelism through message passing interface (MPI) and portable on-node parallelism with the Kokkos C++ programming model [37], [38]. KARFS has performance-portable capabilities for multi- and many-core heterogeneous platforms, which is instrumental to meet emerging demands of exascale computing machines. The employed Kokkos framework allows KARFS to straighforwardly utilize the same source code on CPUs and GPUs while maintaining performance comparable to the code that is optimally tuned to either processing unit. We have taken the approach to use standard C++ programming practices for both on- and off-node parallelism. This has been the main motivation for choosing the Kokkos programming model, in which all kernel launches on a target device and all memory copies between memory spaces are asynchronous by default. With further development of Kokkos to allow multiple stream/task parallelism, it will be possible to schedule multiple streams of tasks to hide latencies. We believe that this will be the most portable programming model for future HPC application. Current choices for asynchronous execution or data management requires vendor specific extensions to the standard programming languages. Furthermore, they require compiler extensions through Pragmas that are not widely supported by all open source compilers. Other available runtime systems cannot currently inter-operate with Kokkos-type programming abstractions. We do not find any of these approaches attractive and hence will track the task parallelism model being developed in Kokkos.

In this paper, the numerical methods available in KARFS have been extended to include a seventh-order mapped WENO (WENO7M) scheme [39] for capturing discontinuities and a stiff ordinary differential equation (ODE) solver for dealing with stiff and complex chemistry. The computational loops and kernels for the WENO scheme are written using Kokkos parallel patterns so as to provide performance portability. Similarly, the linear algebra computations related to the stiff-ODE integrator are accelerated by making use of the MAGMA library [40], which was designed for hybrid computing architectures. In a recent study [41], it was demonstrated that the standard seventh-order WENO scheme developed by Jiang and Shu [42] is less dissipative compared to the corresponding standard fifth-order WENO scheme. Moreover, the standard seventh-order WENO scheme cannot exactly guarantee seventh-order accuracy since the nonlinear weights do not attain optimal values near the critical points. Henrick et al. [39] suggested a mapping technique such that the formal order of accuracy is efficiently recovered near critical points. Subsequent simulations of pulsating one-dimensional detonations demonstrated true fifth order accuracy [43]. These are the main reasons for specifically choosing the WENO7M scheme. Note, however, that it is not the only scheme suitable for reacting flow DNS. Alternatives could also be pure WENO schemes with enhanced smoothness measures, such as the WENO-Z schemes [44] or schemes with symmetric candidate stencils and bandwidth optimized weights, such as the WENO-SYMBO schemes [45].

The main scope of the paper is threefold: (1) to present the implementation and validation of the newly implemented WENO7M scheme and the stiff ODE-solver in KARFS, (2) to investigate the performance characteristics of KARFS by adopting various types of parallelism strategies and (3) to demonstrate execution of KARFS on a variety of architectures including NVIDIA Tesla P100 GPUs and NVIDIA Kepler K20X GPUs without making any changes to the source code.

Section snippets

Governing equations

The DNS code KARFS [36] solves the fully compressible continuity, momentum, total energy, and species equations in conservative form with detailed chemistry for a mixture of ideal gases on structured, Cartesian gridsρt=(ρui)xi,(ρui)t=(ρuiuj)xjPxi+τijxj,(ρet)t=(ρetuj)xj(Puj)xj+(τijui)xjqjxj,(ρYk)t=(ρYkuj)xjJk,jxj+ω˙k, where the Einstein summation convention is implied, ρ is the density, ui is the Cartesian velocity component in the ith coordinate direction (i=1

Numerical method

In its original formulation, KARFS [36] solves the compressible Navier–Stokes, species and energy equations fully explicitly employing a fourth-order, six-stage explicit Runge–Kutta operator for time integration [25]. To add robustness and performance gains when dealing with stiff and complex chemistry, an efficient stiff-ODE solver and a second-order operator splitting algorithm have also been implemented. Depending upon the nature of the problem, spatial discretization can be done using

Sod’s shock tube problem

The popular Sod shock tube problem [52] constitutes probably one of the most standard numerical benchmarks designed for compressible flow solvers. This test consists of a fluid, initially at rest, in which a virtual membrane located at the center of the domain separates two distinct sections: the one on the left at a higher density and pressure, and the one on the right at a lower density and pressure. The membrane is removed at t=0 and a shock wave develops propagating toward the right,

Performance characteristics

In this section, we study the performance characteristics of KARFS and its scalability using different block sizes. The primary system used in this work is the Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). Titan is a hybrid architecture Cray XK7 system, consisting of 18,688 compute nodes that are interconnected by Cray’s Gemini network. Each node is composed of a 16-core AMD Opteron processor and an NVIDIA Tesla K20X graphics processing unit (GPU) as an accelerator.

Conclusions

Using the MPI + X programming model, the implementation and validation of a seventh-order, minimally dissipative, mapped WENO (WENO7M) scheme and an efficient operator-splitting method in KARFS are demonstrated. The MPI + X programming model relies on Kokkos for “X” for performance portability to multi-core, many-core and GPUs. The capability and potential of the newly implemented WENO7M scheme in KARFS to perform DNS of compressible flows is also demonstrated with model problems involving

CRediT authorship contribution statement

Swapnil Desai: Conceptualization, Methodology, Software, Writing - original draft, Data curation, Investigation, Formal analysis, Validation, Visualization. Yu Jeong Kim: Data curation, Investigation, Validation. Wonsik Song: Data curation, Investigation, Validation. Minh Bau Luong: . Francisco E. Hernández Pérez: Data curation, Investigation, Validation, Visualization, Formal analysis. Ramanan Sankaran: Writing - review & editing, Supervision, Resources. Hong G. Im: Writing - review & editing,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was sponsored by competitive research funding from King Abdullah University of Science and Technology (KAUST). This research used resources of the computer clusters at KAUST Supercomputing Laboratory (KSL), the Oak Ridge Leadership Computing Facility and the Compute Data and Environment for Science (CADES) at ORNL, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

References (71)

  • C. Strozzi et al.

    Experimental analysis of propagation regimes during the autoignition of a fully premixed methane - air mixture in the presence of temperature inhomogeneities

    Combust Flame

    (2012)
  • X. Ma et al.

    An optical study of in-cylinder CH2O and OH chemiluminescence in flame-induced reaction front propagation using high speed imaging

    Fuel

    (2014)
  • G. Bansal et al.

    Autoignition and front propagation in low temperature combustion engine environments

    Combust Flame

    (2011)
  • S. Gupta et al.

    Classification of ignition regimes in HCCI combustion using computational singular perturbation

    Proc Combust Inst

    (2011)
  • S. Gupta et al.

    Analysis of n-heptane auto-ignition characteristics using computational singular perturbation

    Proc Combust Inst

    (2013)
  • P. Pal et al.

    Computational characterization of ignition regimes in a syngas/air mixture with temperature fluctuations

    Proc Combust Inst

    (2017)
  • M.B. Luong et al.

    Prediction of ignition modes of NTC-fuel/air mixtures with temperature and concentration fluctuations

    Combust Flame

    (2020)
  • G.H. Yu et al.

    Ignition characteristics of a temporally evolving n-heptane jet in an iso-octane/air stream under RCCI combustion-relevant conditions

    Combust Flame

    (2019)
  • S. Desai et al.

    Auto-ignitive deflagration speed of methane (CH4) blended dimethyl-ether (DME)/air mixtures at stratified conditions

    Combust Flame

    (2020)
  • R. Sankaran et al.

    The effects of non-uniform temperature distribution on the ignition of a lean homogeneous hydrogen air mixture

    Proc Combust Inst

    (2005)
  • C.A. Kennedy et al.

    Several new numerical methods for compressible shear-layer simulations

    Appl Numer Math

    (1994)
  • C.A. Kennedy et al.

    Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations

    Appl Numer Math

    (2000)
  • X.-D. Liu et al.

    Weighted essentially non-oscillatory schemes

    J Comput Phys

    (1994)
  • K. Ritos et al.

    Physical insight into the accuracy of finely-resolved iLES in turbulent boundary layers

    Comput Fluids

    (2018)
  • K. Ritos et al.

    Performance of high-order implicit large eddy simulations

    Comput Fluids

    (2018)
  • Y. Shi et al.

    Redesigning combustion modeling algorithms for the graphics processing unit (GPU): chemical kinetic rate evaluation and ordinary differential equation integration

    Combust Flame

    (2011)
  • K.E. Niemeyer et al.

    Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

    J Comput Phys

    (2014)
  • F.E. Hernández Pérez et al.

    Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration

    Comput Fluids

    (2018)
  • A.K. Henrick et al.

    Mapped weighted essentially non-oscillatory schemes: achieving optimal order near critical points

    J Comput Phys

    (2005)
  • S. Tomov et al.

    Towards dense linear algebra for hybrid GPU accelerated manycore systems

    Parallel Comput

    (2010)
  • S. Zhao et al.

    Comparison of improved finite-difference WENO schemes for the implicit large eddy simulation of turbulent non-reacting and reacting high-speed shear flows

    Comput Fluids

    (2014)
  • G.-S. Jiang et al.

    Efficient implementation of weighted ENO schemes

    J Comput Phys

    (1996)
  • A.K. Henrick et al.

    Simulations of pulsating one-dimensional detonations with true fifth order accuracy

    J Comput Phys

    (2006)
  • R. Borges et al.

    An improved weighted essentially non-oscillatory scheme for hyperbolic conservation laws

    J Comput Phys

    (2008)
  • M.P. Martín et al.

    A bandwidth-optimized WENO scheme for the effective direct numerical simulation of compressible turbulence

    J Comput Phys

    (2006)
  • Cited by (35)

    View all citing articles on Scopus

    Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan http://energy.gov/downloads/doe-public-access-plan.

    View full text