Using a Gaussian process regression inspired method to measure agreement between the experiment and CFD simulations

https://doi.org/10.1016/j.ijheatfluidflow.2019.108497Get rights and content

Highlights

  • The Gaussian process regression(GPR) model is used to fit the experimental data.

  • The agreement between an experiment and the simulation is replaced by comparing outputs of the simulation and the GPR model.

  • Two metrics are used to provide tangible information for the local and global agreement, repectively.

  • The quantitative information helps to make an objective argument for the accuracy level of a CFD model.

Abstract

This paper presents a Gaussian process regression inspired method to measure the agreement between experiment and computational fluid dynamics (CFD) simulation. Because of misalignments between experimental and numerical outputs in spatial or parameter space, experimental data are not always suitable for quantitative assessing the numerical models. In this proposed method, the cross-validated Gaussian process regression (GPR) model, trained based on experimental measurements, is used to mimic the measurements at positions where there are no experimental data. The agreement between an experiment and the simulation is mimicked by the agreement between the simulation and GPR models. The statistically weighted square error is used to provide tangible information for the local agreement. The standardised Euclidean distance is used for assessing the overall agreement.

The method is then used to assess the performance of four scale-resolving CFD methods, such as URANS k-ω-SST, SAS-SST, SAS-KE, and IDDES-SST, in simulating a prism bluff-body flow. The local statistically weighted square error together with standardised Euclidean distance provide additional insight, over and above the qualitative graphical comparisons. In this example scenario, the SAS-SST model marginally outperformed the IDDES-SST and better than the other two other, according to the distance to the validated GPR models.

Introduction

Understanding complex flow phenomena is an important job in many in many fields, e.g., performance of electric gadgets (Conficoni et al., 2015; Kaplan et al., 2013), flow structure interaction (Glück et al., 2001; Hou et al., 2012), thermal hydraulics in complex system (Amin et al., 2018; Duan et al., 2014; Duan and He, 2017a, 2017b), as well as dispersion of air pollutants (Al-Abidi et al., 2013; Allegrini et al., 2014; Gilani et al., 2016; Gromke et al., 2015). Thanks to the rapid development of commercial computational fluid dynamics (CFD) software in conjunction with growing computing power, the replacement of full-scale experiments with sophisticated CFD models is a growing trend. However, high-fidelity CFD methods, such as DNS and LES, are still impractical in most industrial cases. Currently, the most widely used CFD methods, such as RANS and Hybrid LES/RANS, are subject to various levels of simplification and assumption, which introduce model bias into the predictions. Therefore, identifying the suitability of a CFD model for specific scenarios is an ongoing endeavour.

Graphical comparison is a common approach to assess the performance of CFD models or validation and calibration of turbulence models, for instance in (Billard et al., 2012; Duan et al., 2019, 2018a, 2018b; Keshmiri et al., 2016b, 2016a; Launder and Spalding, 1974; Menter, 2009; Revell et al., 2006; Shih et al., 1995). The approach qualitatively determines the agreement between the model and experiments by observing the data plotted in figures. By its nature, the conclusion of a graphical comparison is observer dependent and may be biased, especially when the results of different models are close to each other. Therefore, quantitative assessment of the agreement between CFD models and measurements is important in terms of providing a more objective conclusion for ranking the models and validation. The widely accepted goal of validation is the determination of the degree to which a model is an accurate representation of the physics from the perspective of intended uses of the model (AIAA, 1998; ASME, 2006; Oberkampf and Trucano, 2008; Oberkampf et al., 2002).

A review of previously developed validation methods can be found in the article (Lee et al., 2016; Ling and Mahadevan, 2013). The methods for assessing the validity of a numerical model can be classified into two types, namely hypothesis tests and metrics assessment. The hypothesis test aims to accept/reject the model, whilst metrics assessments focus on quantifying the agreement between the model and experiment. The hypothesis test, such as the p-value test and the Bayesian factor, were developed by considering either statistics of the experiment or the numerical simulation. Meanwhile, many metric assessment methods were developed and assessed, such as Euclidean distance (Audouin et al., 2011; Peacock et al., 1999), Mahalanobis distance (Rebba and Mahadevan, 2006; Zhao et al., 2017), the confidence interval (Barone et al., 2006; Oberkampf and Barone, 2006), and area metrics (Ferson et al., 2008; Ferson and Oberkampf, 2009). Both Euclidean distance and Mahalanobis distance measure the difference between two vectors. The former treats the squared error of each element in the vector equally, while the latter considers the bias due to the statistical features of the elements. The confidence interval is designed to provide the confidence level of the model error. Area metrics assess the similarity of the cumulative distribution of different stochastic processes obtained using experiments and simulations.

Additionally, a high fidelity database is required to validate the numerical method or assess the capability of a physical model. Instead of building a new test rig or starting a high-fidelity simulation, such as a direct numerical simulation (DNS), (which may not be practical) utilizing historical experimental databases is a more feasible and economical choice. In most cases, it is impossible to use measurements directly in a quantitative validation, because of the misalignment between the measurements and numerical outputs in the spatial or parameter domain. Barone et al. (2006), as well as, Oberkampf and Barone (2006) suggested to first fit a non-linear regression model to the measurements and then use the fitted curve to replace the experimental measurement. Accordingly, the confidence interval on the model error as well as the global agreement metric can be calculated even when the measures are sparse over the range of input parameter. One of the major limitations of this approach comes from the regression method. It is known that the functional form chosen for the non-linear regression will have a large impact on the results. Based on the Bayesian calibration procedure developed by Kennedy and O'Hagan (2001). Wang et al. (2009) suggested a framework using Gaussian process regression (GPR) to fit the curve as well as the confidence interval for the model error . The procedure also provides the mean and confidence interval on the model bias (error) over the observation domain. As a further development of Wang et al. (2009), Chen et al. (2008) developed a metric based on the p-value test designed to determine the accuracy of a model for design purposes. These existing methods are not suited to ranking the performance of different numerical models.

This work focuses on quantitatively assessing the difference between experiment and simulations, as well as ranking the numerical models used. It consists of two steps: curve-fitting of the observations using GPR and evaluation of the distance between the fitted curve and numerical outputs using metrics. The statistically weighted squared error is used to represent the local distance, whilst the standardised Euclidean distance is used to provide the overall distance between the model outputs and the fitted curve. The numerical models will be ranked in terms of performance by the comparison of their standardised Euclidean distances.

In current practice, CFD models are treated as deterministic. However, the numerical model is affected by many uncertainties arising from inadequate knowledge of physical system, e.g., variability in physical properties (Rebba and Mahadevan, 2008). Exploring the stochastic feature of a variable, using a numerical method, requires hundreds or even thousands of simulations. However, CFD simulations are often computationally expensive, especially for scale-resolved simulations. As a result, it is often too expensive to recreate the probability distributions of a variable purely using CFD simulations. It is possible to simulate the stochastic feature using a surrogate model to mimic the CFD model in the event domain. This will be pursued in our future work.

The rest of this paper is organised as follows: a brief introduction to the GPR method is given in Section 2. The definition of the statistically weighted squared error and the standardised Euclidean distances are listed in Section 3. Section 4 contributes to describing the experiment by Volvo (Sjunnesson et al., 1992) and CFD models. GPR predictions are validated in Section 5 before demonstrating the quantification procedure in Section 6. General conclusions are given in Section 7.

Section snippets

Gaussian process regression

The GPR, also known as kriging (Krige, 1951), provides the interpolation of the unknown based on prior knowledge. Compared to parametric regression methods, such as least-squares linear regression and polynomial regression, GPR is a more rigorous method for the treatment of complex noisy non-linear functions (Chilenski et al., 2015). Instead of the use of a prescribed functional form for the regression function, GPR uses the prior information (the known data) to estimate the posterior (unknown

Distance between two datasets

The Euclidean distance is sufficient for describing the difference between two deterministic datasets. In the calculation of Euclidean distance each data point contributes equally to the value. However, experimental observations are generally subject to random fluctuations of different magnitudes. As a result, the Euclidean distance is not suitable for assessing the agreement between experimental observations and numerical outputs. It is reasonable to weight the coordinates subject to greater

Descriptions of the demonstration case

The isothermal and non-reaction prism bluff-body flow measured by Volvo company (Sjunnesson et al., 1992, 1991) is simulated by different scale-resolving CFD methods. The experimental facility, flow conditions as well as the setting-up of CFD models are included in this section.

LOO-CV of the GPR model

Before training the GPR models, it is useful to explore the shape of the measured profiles. Profiles of U/Ub, urms/Ub and vrms/Ub in the y-direction should be symmetric about y/a = =0.0. Hence, it is reasonable to reflect observations of U/Ub, urms/Ub and vrms/Ub about y/a = =0.0. Moreover, the measurements of uv/Ub2 are rotated 180˚ around the point (uv/Ub2=0, y/a = =0.0). The rotated measurements are then added to the original dataset. In this way, more data points are obtained for the

Quantification of the agreement between the experiment and CFD models

The agreement between the experiment (Sjunnesson et al., 1992, 1991) and simulations are examined in this section using the quantities on the previously defined locations, as shown in Fig. 2(b). The difference between the experimental measurements and numerical outputs is quantified by the distance between the GPR predictions and CFD outputs. The profiles of the quantities of interests (QoIs) obtained by various turbulence models, as well as, the statistically weighted squared error ((γi*)2/var(

Summary and conclusions

A GPR based method for measuring the agreement between outputs of CFD simulations and experimental measurements is proposed in this paper. The GPR model, trained and validated using the measures, provides pseudo-experimental measurements at positions in the simulation where no direct experimental measurements exist. The differences between the numerical models and experiment are mimicked by comparing the numerical output to the predictions of validated GPR models. Quantified information is

Declaration of Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors would like to thank Rolls-Royce for the financial support via the project grant P65165_MECM.

Reference (63)

  • B.E. Launder et al.

    The numerical computation of turbulent flows

    Comput. Methods Appl. Mech. Eng.

    (1974)
  • Y. Ling et al.

    Quantitative model validation techniques : new insights

    Reliab. Eng. Syst. Saf.

    (2013)
  • H. Liu et al.

    Remarks on multi-output gaussian process regression

    Knowl. Based Syst.

    (2018)
  • W.L. Oberkampf et al.

    Measures of agreement between computation and experiment: validation metrics

    J. Comput. Phys.

    (2006)
  • W.L. Oberkampf et al.

    Verification and validation benchmarks

    Nucl. Eng. Des.

    (2008)
  • R. Rebba et al.

    Computational methods for model reliability assessment

    Reliab. Eng. Syst. Saf.

    (2008)
  • R. Rebba et al.

    Validation of models with multivariate output

    Reliab. Eng. Syst. Saf.

    (2006)
  • A. Revell et al.

    A stress strain lag eddy viscosity model for unsteady mean flow

    Int. J. Heat Fluid Flow

    (2006)
  • T.-.H. Shih et al.

    A new k-epsilon eddy viscosity model for high Reynolds number turbulence flows

    Comput. Fluids

    (1995)
  • L. Zhao et al.

    Validation metric based on mahalanobis distance for models with multiple correlated responses

    Reliab. Eng. Syst. Saf.

    (2017)
  • Guide for the Verification and Validation of Computational Fluid Dynamics Simulations

    (1998)
  • M.M. Amin et al.

    Large eddy simulation study on forced convection heat transfer to water at supercritical pressure in a trapezoid annulus

    J. Nucl. Eng. Radiat. Sci.

    (2018)
  • Guide for verification and validation in computational solid mechanics

    Am. Soc. Mech. Eng. PTC

    (2006)
  • M.F. Barone et al.

    Validation case study: prediction of compressible turbulent mixing layer growth rate

    AIAA J.

    (2006)
  • W. Chen et al.

    A design-driven validation approach using Bayesian prediction models

    J. Mech. Des.

    (2008)
  • M.A. Chilenski et al.

    Improved profile fitting and quantification of uncertainty in experimental measurements of impurity transport coefficients using Gaussian process regression

    Nucl. Fusion

    (2015)
  • C. Conficoni et al.

    Energy-aware cooling for hot-water cooled supercomputers

  • Y. Duan et al.

    A validation of CFD methods on predicting valve performance parameters

  • Y. Duan et al.

    Assessments of different turbulence models in predicting the performance of a butterfly valve

  • Y. Duan et al.

    Large eddy simulation of a buoyancy-aided flow in a non-uniform channel – Buoyancy effects on large flow structures

    Nucl. Eng. Des.

    (2017)
  • Y. Duan et al.

    Heat transfer of a buoyancy-aided turbulent flow in a trapezoidal annulus

    Int. J. Heat Mass Transf.

    (2017)
  • Cited by (23)

    • Multi-objective optimization of stirring tank based on multiphase flow simulation

      2023, Chemical Engineering Research and Design
      Citation Excerpt :

      For problems such as solid-liquid suspension where neither the mathematical mappings between design variables and objectives nor the constraints are available, the application of surrogate models is a viable option. Several surrogate models can be currently implemented in conjunction with CFD to address engineering problems, including Response Surface Methods (Lin et al., 2019; Aghbolaghy and Karimi, 2014; Song et al., 2014), Kriging Models (Koziel et al., 2016; Shen et al., 2022; Nouri et al., 2018), and Gaussian Process Regression Models (Duan et al., 2019; Morita et al., 2022). Particularly, Kriging Models can be used in numerical experiments of engineering issues to predict unknown points by interpolating a finite number of simulation results (Kleijnen, 2009; Ulaganathan et al., 2014).

    • Step heating thermography supported by machine learning and simulation for internal defect size measurement in additive manufacturing

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      Those prediction errors that were less than the threshold (ε) were ignored and treated as equal to zero. Finally, the Gaussian Processes Regression (GPR) methods are based on the application of non-parametric kernel functions based on probabilistic models (Bayesian inference) [43]. They are non-parametric methods that are usually more suitable for complex problems than the previously described standard regression methods, especially for the treatment of complex and noisy nonlinear functions [44] and for their cross-validation.

    • You only design once (YODO): Gaussian Process-Batch Bayesian optimization framework for mixture design of ultra high performance concrete

      2022, Construction and Building Materials
      Citation Excerpt :

      The GP is a supervised non-parametric ML method that can provide a prediction of an unknown response variable based on prior collected data. Unlike parametric regression such as least-square regression, GP can provide a more rigorous method in dealing with noisy and complex data [32] and is normally used to provide surrogate models for complex computational ML algorithms [33–36]. The output of GP is a posterior probability of the response variable with the input parameters used as a prior.

    View all citing articles on Scopus
    View full text