Accelerating Gaussian Process surrogate modeling using Compositional Kernel Learning and multi-stage sampling framework
Introduction
Simulation model such as Finite Element Analysis (FEA) and Computational Fluid Dynamics (CFD) is a mathematical representation of a real-world physical problem implemented in a computer code. Nowadays, the simulation models have been extensively used in various types of engineering problems (e.g., domain exploration, design optimization, sensitivity/uncertainty analysis and inverse analysis) [1], [2], [3], [4], because physical experiments are either highly expensive or technically impossible.
As physical knowledge and computing power become more advanced, more sophisticated simulation models are gaining widespread use to tackle various complex engineering problems. These simulation models typically have non-linear and complex response surfaces with large input spaces [5], [6]. In addition, these simulation models often require huge computational resources (i.e., long run-time with huge computing power). If the analysis of the simulation model is iterated many times, the computational process would be highly challenging under limited resources.
To mitigate the computational burden, surrogate models have been gaining a considerable attention as a cost-effective substitute for the simulation model [7], [8], [9], [10]. Since the deterministic simulation model produces identical outputs with identical inputs [3], the response surface of the simulation model can be represented by a mathematical/statistical representation [11]. This representation is referred to as a surrogate model, also known as response surface model, emulator and meta-model. Once the surrogate model is constructed, the surrogate model is implemented without running additional simulations for design optimization, design space exploration and sensitivity/uncertainty analysis.
Based on the purpose of the engineering problems, surrogate modeling can be categorized into (1) global surrogate modeling and (2) black-box optimization. The global surrogate modeling aims at mimicking the response surface over the input space (for sensitivity or uncertainty analysis) [12], [13], [14], while the black-box optimization utilizes a sequential design strategy for global optimization of the black-box functions [15], [16], [17]. The scope of this study is confined to the global surrogate modeling. The global surrogate modeling consists of two stages: (1) sampling stage, wherein a set of simulation runs (known as training samples) is performed over the input space based on sampling strategies; and (2) model-fitting stage, wherein the surrogate model is fitted using the training samples. Among various types of the surrogate modeling techniques and sampling methods, selecting robust methods is still a challenging for practical problems [14].
For the successful global surrogate modeling, the learning capability of the surrogate model is important. There are various types of the surrogate modeling such as polynomial model (POLY) [18], [19], [20], radial basis function (RBF) [19], [20], [21], [22], [23], [24], [25] and Gaussian Process (GP) [19], [20], [22], [23], [25], [26], [27], [28], [29], [30], [31]. Chronologically, Table 1 shows the recent works on comparative study of surrogate modeling. The improvements in computational power significantly increase the research interest for the development of more advanced surrogate modeling to improve learning capability. As a result, the non-parametric models such as GP (also known as Kriging) and RBF are prevalent for the global surrogate modeling, since they can approximate the response surface more flexibly than parametric models (e.g., polynomial model) [9]. Recently, the enhanced-GPs (such as Blind Kriging and Gradient-enhanced Kriging) have been gaining a considerable attention for the engineering problems [13], [24], [28], [29], [30], [31], [32], [33]. Comprehensive works for the surrogate modeling [14], [34] show that the optimal model is case-dependent to the problem types and modeling setting (e.g., kernel function in GP).
Sampling method (also known as design of experiment, DOE) generates the training samples to gather informative experiments (simulations) for the surrogate modeling. The accuracy of the surrogate model heavily depends on training samples, so that the sampling method is crucial to the predictive quality of the surrogate model. Classical sampling methods in the DOE (e.g., central composite design) can be utilized to generate training samples. However, they usually generate more samples around the boundary regions. For a computational DOE, it is preferred to fill the entire input space (i.e., space-filling) rather than filling the boundary regions [35]. In this context, space-filling sampling (SFS) method has gained much popularity for the surrogate modeling. Latin hypercube sampling (LHS) [36] and Low-discrepancy sequence (e.g., Sobol’s sequence) [37] are the most popular SFS method in various fields.
Conventional SFS method is a single-stage sampling strategy of generating the entire samples at once. The optimal LHS has been developed to improve space-filling properties. The optimal LHS optimizes some space-filling criteria (such as maximin distance criterion [38], [39], orthogonal arrays criterion [40], [41] and so on [42], [43], [44]) to generate the training samples. In the conventional SFS method, the size of the training samples should be pre-determined. However, it is difficult even for experts to determine an appropriate size of the training samples in advance. Hence, this difficulty actuates the development of the sequential SFS method [5], [45], [46], [47], [48], [49]. Table 2 shows the development of the SFS method in chronological order. The sequential SFS method has developed based on the conventional SFS to augment training samples. To ensure the desired space-filling property, the sequential SFS method treats the sampling process as a set of optimization problems by optimizing some space-filling criteria. It is worth noting that the sequential SFS method with a nested design has recently gained a considerable attention. The nested design sequentially generates successive sets of samples by making former samples as a subset of the latter samples [5], [45], [46], [48], [49].
The accuracy of the surrogate model is strongly dependent to (1) learning capability of the surrogate model and (2) training samples. The accuracy of the surrogate model is often unsatisfactory to represent the response surface of the simulation model. The training samples should be sufficient to capture the response surface, while the learning capability of the surrogate model should be maximized to learn the response surface effectively. In general, they interact with each other and have the influence on the accuracy of the surrogate model. For example, a large size of training samples is required to get reasonable accuracy of the surrogate model under inefficient learning capability. In this context, it is important to validate the accuracy of the surrogate model before its implementation. However, there is little research for diagnostics of the surrogate model [3], [13], [51].
To address the issues simultaneously, this paper proposes a new surrogate modeling based on the GP by incorporating a Compositional Kernel Learning (CKL) method [52], [53], [54] into a sequential SFS strategy termed the Progressive Latin Hypercube Sampling (PLHS) [5]. The CKL is developed by Duvenaud, et al. [52] in the machine learning community. The covariance kernels of the GP are known to be closed under compositional rules (i.e., sum and product) [52]. Thus, the CKL automatically discovers a compositional kernel for a richly structured kernel to represent complex properties of the function. Although the CKL is outstanding to learn both simple and complex functions, the CKL is somewhat new for surrogate modeling. For the diagnostics of the GP with appropriate size of the training samples, the proposed method introduces the PLHS. The PLHS successively generates a series of the sub-samples (i.e., smaller slices) by maintaining desired properties for the distribution (space-filling and projective properties). Sheikholeslami, et al. [5] demonstrated that the PLHS shows the outstanding performance that scales effectively with the dimensionality of the problem. A series of the sub-samples in the PLHS are Latin hypercube as shown in Sheikholeslami, et al. [5], so that the sub-samples preserves projective properties (i.e., Latin hypercube) along with space-filling properties (i.e., maximin distance criterion). For the diagnostics of the surrogate model, the proposed method utilizes two consecutive sub-samples in the PLHS as training and validation samples, respectively. By virtue of using the nested samples in the PLHS, the proposed method allows users to monitor the diagnostics of the GP and assess the stopping criteria for further sampling. Numerical experiments reveal that (1) the proposed method generally outperforms or performs similarly to the best one among a set of surrogate models, so that the proposed method can learn the various types of response surfaces (i.e., scale and complexity) flexibly and efficiently; and (2) the proposed method only provides robust correlations between accuracies from validation samples (generated by the PLHS) and test samples (not available in real applications). These results indicate that the only proposed method can ensure a diagnostic measure for the global surrogate modeling via the proposed framework.
The remaining of this study is organized as follows. Section 2 firstly introduces a Gaussian process model with the CKL. Then, the PLHS is presented for diagnostics of the GP to find an appropriate size of the training samples. In Section 3, the proposed method for surrogate modeling is introduced. Section 4 shows descriptions of numerical experiments. Then, the results of numerical experiments are provided in Section 5. The discussion of the proposed method is given in Section 6. Lastly, Section 7 summarizes the conclusion. Hereafter, the boldface letters indicate vectors or matrices.
Section snippets
Gaussian Processes
A Gaussian Process (GP) is a Bayesian non-parametric model to provide an analytically tractable way of learning a complex function from input to output [26]. GP is a distribution over functions such that any finite set of function values has a joint multivariate Gaussian distribution. In this context, GP is completely defined by a mean function () and covariance kernel (). The response surface () is assumed to be a finite set of function values () with input represented by ;
Proposed method using CKL and PLHS for surrogate modeling
Although the GP has been widely used for the surrogate modeling due to its advantages (e.g., prediction uncertainty), there still remains three difficulties as follows: (1) choice of the proper covariance kernel; (2) appropriate size of the training samples; and (3) diagnostics for accuracy. To address these difficulties simultaneously, this study proposed an efficient surrogate modeling method based on GP by integrating the CKL with the PLHS. Fig. 6 shows the flowchart of the proposed method.
Test function and their characteristics
To compare the proposed method with other methods, nine test functions were selected from literatures to cover the range of dimensionality and complexity. The mathematical representations of the test functions are summarized in Appendix B. These test functions are well-known benchmark problems in surrogate modeling and optimization problems. Table 3 summarizes the characteristics of the test functions. In terms of the dimensionality (), the test functions can be categorized into three levels:
Results and analysis
In order to account for sampling variability, different random seeds were used to generate ten different replicates of training samples. Based on the ten replicates, numerical experiments were performed to evaluate the robustness against the random components in the proposed methods. The proper size of the training samples depends on the types of the problems and computational budgets. Since there is no optimal way to determine the size of the training samples, the empirical formula (Eq. (18))
Discussion on proposed method
This section provides the following two issues related to the proposed method: (1) computational complexity and (2) limitation to discontinuous response surfaces.
– Computational complexity
The computational complexity of the GP exponentially increases, according to the sample size (). Although the CKL provides the superior learning capability as seen in Section 5.1, the iterative-fitting of the CKL aggravates the computational complexity. Therefore, the proposed method is
Conclusions
This study proposed the sequential surrogate modeling using the Compositional Kernel Learning (CKL) with the Progressive Latin Hypercube Sampling (PLHS). The proposed method can improve learning capability for response surfaces with the diagnostic measure for the global accuracy of the Gaussian Process (GP). The proposed method employs the CKL automatically to discover the proper covariance kernel under observed samples. Until the desired accuracy of the GP is achieved, the PLHS is implemented
CRediT authorship contribution statement
Seung-Seop Jin: Conceptualization, Methodology, Software, Writing, Revision, Supervision, Project administration, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1C1C1009236).
References (73)
Surrogate-based analysis and optimization
Prog. Aerosp. Sci.
(2005)Progressive Latin Hypercube Sampling: An efficient approach for robust sampling-based analysis of environmental models
Environ. Model. Softw.
(2017)Sequential surrogate modeling for efficient finite element model updating
Comput. Struct.
(2016)Radio-frequency inductor synthesis using evolutionary computation and Gaussian-process surrogate modeling
Appl. Soft Comput.
(2017)Bayesian Analysis of computer code outputs: A tutorial
Reliab. Eng. Syst. Saf.
(2006)Comparative study of metamodelling techniques in building energy simulation: Guidelines for practitioners
Simul. Model. Pract. Theory
(2014)High dimensional kriging metamodelling utilising gradient information
Appl. Math. Model.
(2016)Efficient uncertainty quantification for a hypersonic trailing-edge flap, using gradient-enhanced kriging
Aerosp. Sci. Technol.
(2018)Blind Kriging: Implementation and performance analysis
Adv. Eng. Softw.
(2012)Bayesian Model averaging for kriging regression structure selection
Probab. Eng. Mech.
(2019)
An enhanced Kriging surrogate modeling technique for high-dimensional problems
Mech. Syst. Signal Process.
A comparison of six metamodeling techniques applied to building performance simulations
Appl. Energy
Minimax and maximin distance designs
J. Statist. Plann. Inference
Projection array based designs for computer experiments
J. Statist. Plann. Inference
Algorithmic construction of optimal symmetric Latin hypercube designs
J. Statist. Plann. Inference
Efficient space-filling and near-orthogonality sequential Latin hypercube for computer experiments
Comput. Methods Appl. Math.
A review and comparison of four commonly used Bayesian and maximum likelihood model selection tools
Ecol. Model.
Numerical assessment of metamodelling strategies in computationally intensive optimization
Environ. Model. Softw.
A systematic approach to determining metamodel scope for risk-based optimization and its application to water distribution system design
Environ. Model. Softw.
Realization of learning induced self-adaptive sampling in noisy optimization
Appl. Soft Comput.
Adaptive surrogate modeling with evolutionary algorithm for well placement optimization in fractured reservoirs
Appl. Soft Comput.
Bayesian calibration of computer models
J. R. Stat. Soc. Ser. B Stat. Methodol.
Diagnostics for Gaussian process emulators
Technometrics
Verification and Validation in Scientific Computing
A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design
Struct. Multidiscip. Optim.
Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction
Technometrics
A taxonomy of global optimization methods based on response surfaces
J. Global Optim.
Gaussian Processes with Optimal Kernel Construction for Neuro-Degenerative Clinical Onset Prediction
Application of surrogate models in estimation of storm surge: A comparative assessment
Appl. Soft Comput.
Performance evaluation of metamodelling methods for engineering problems: towards a practitioner guide
Struct. Multidiscip. Optim.
Efficient global optimization of expensive black-box functions
J. Global Optim.
Predictive entropy search for efficient global optimization of black-box functions
Surrogate optimization of computationally expensive black-box problems with hidden constraints
INFORMS J. Comput.
Empirical Model-Building and Response Surface
Metamodeling method using dynamic kriging for design optimization
AIAA J.
Radial Basis Functions : Theory and Implementations
Cited by (6)
Kriging-assisted indicator-based evolutionary algorithm for expensive multi-objective optimization
2023, Applied Soft ComputingGaussian process-assisted active learning for autonomous data acquisition of impact echo
2022, Automation in ConstructionCitation Excerpt :Typically, internal damage is expected to exist in internal regions rather than in boundary regions. In this context, filling the entire input space (i.e., space-filling) is preferred over grid-based sampling [30,45]. In the proposed framework, an optimal Latin hypercube design (LHD) was utilized to generate the initial samples by optimizing the maximin distance criterion.
Combining point and distributed strain sensor for complementary data-fusion: A multi-fidelity approach
2021, Mechanical Systems and Signal Processing