Adaptive sampling with automatic stopping for feasible region identification in engineering design

Qing, Jixiang; Knudde, Nicolas; Garbuglia, Federico; Spina, Domenico; Couckuyt, Ivo; Dhaene, Tom

doi:10.1007/s00366-021-01341-7

Adaptive sampling with automatic stopping for feasible region identification in engineering design

Original Article
Published: 10 March 2021

Volume 38, pages 1955–1972, (2022)
Cite this article

Download PDF

Engineering with Computers Aims and scope Submit manuscript

Adaptive sampling with automatic stopping for feasible region identification in engineering design

Download PDF

Jixiang Qing ORCID: orcid.org/0000-0002-6446-6286¹,
Nicolas Knudde¹,
Federico Garbuglia¹,
Domenico Spina¹,
Ivo Couckuyt¹ &
…
Tom Dhaene¹

2491 Accesses
Explore all metrics

Abstract

Engineering design is a complex process to find a suitable trade-off among different, and sometimes conflicting, design specifications. In reality, these requirements can be often considered as constraints of the design problem, that can be defined in terms of performance measures or geometrical characteristics of the device under study. In this paper, a new design space exploration methodology is presented for discovering feasible regions in the design space, where the term feasible region indicates the set of all design configurations satisfying all constraints of the design problem. The proposed method is based on Gaussian process metamodels to estimate the feasible region and leverages a information-based adaptive sampling technique to sequentially refine the prediction accuracy, which is applicable for multiple constraints problems. To efficiently stop the adaptive sampling process, a novel framework to estimate the metamodel’s prediction accuracy is proposed. The efficiency, accuracy and robustness of the proposed approach are compared with state-of-art techniques on suitable benchmark problems and practical engineering examples.

Initial sampling methods in metamodel-assisted optimization

Article 09 August 2014

Yoel Tenne

A multi-point sampling method based on kriging for global optimization

Article 14 February 2017

Xiwen Cai, Haobo Qiu, … Xinyu Shao

A quantile-based sequential approach to reliability-based design optimization via error-controlled adaptive Kriging with independent constraint boundary sampling

Article 07 January 2021

Chi Zhang & Abdollah Shafieezadeh

1 Introduction

Modern engineering design is a complex and systematic task. Different disciplines are involved and various goals are set depending on the design stages. In the early stage of the design pipeline, a trade-off between conflicting design requirements must be found to generate satisfying solutions for the problem at hand. Without loss of generality, these requirements are presented as constraints. The designs satisfying such constraints are considered feasible and can be good candidates for detailed investigation. In practice, Autodesk [1] pointed out that, in the preliminary design stage, designers would use design optimization (e.g., [2, 3]) as a heuristic design tool. These algorithms are originally proposed to push the performance of a design problem to its extreme as a finalizing step in the design process [4], but now optimization has been misused to explore the space of feasible designs at the beginning of the design cycle. This interesting phenomenon stems from the fact that there is a need for a generative design process where designers are presented with several interesting design candidates.

Nevertheless, the sole emphasis on optimality by optimization algorithms may not be the best choice at this stage and there is a need of techniques identifying all the suitable designs. Several arguments support this claim. First, the optimal design configuration may be just used as the start of the design process [4], being too myopic at a limited number of optima may ignore some interesting design candidates. Second, especially at the early design stage, the objectives of the system that need to be optimized may not yet be clearly defined [5] and designers need a strategy to gain understanding about the design space. In this case, unconventional but feasible designs may also be valuable and worth further investigation. Additionally, the obtained result may not transfer well to reality [6] (a phenomenon called the reality gap [7]). In these cases, a large collection of suitable design configurations can quickly provide a working solution.

This alternative interpretation can be formulated as a feasible region identification problem, where one seeks to make accurate predictions if a design configuration is feasible or not:

$$\begin{aligned} \begin{aligned}&\text {Find}\ A := \{{\varvec{x}} \in {\mathcal {X}} : I_A(f({\varvec{x}}))=1\}\\&\text {where}\ I_A(f({\varvec{x}}))= {\left\{ \begin{array}{ll} 1&{} \alpha \le f({\varvec{x}}) \le \beta \\ 0&{} \text {otherwise} \end{array}\right. } \end{aligned} \end{aligned}$$

(1)

$f: {\mathcal {X}}\rightarrow {\mathbb {R}}$ is a noise-free, continuous, black box constraint function defined on a compact set ${\mathcal {X}} \subset {\mathbb {R}}^d$. $I_A(\cdot )$ is an indicator function defined on an interval of the output space explicitly specified by the designer, which denotes if design requirements are satisfied. In the bounded space ${\mathcal {X}}$, a design ${\varvec{x}}$ is called feasible if its corresponding constraint function value $f({\varvec{x}})$ is in the interval $[\alpha , \beta ]$. The region is called feasible region if all designs within are feasible. Specifically, we denote $\alpha$, $\beta$ $\in \{{\mathbb {R}}, -\infty , \infty \}$ as feasible boundary, while $\{{\varvec{x}} \in {\mathcal {X}} : f({\varvec{x}}) \in \{\alpha , \beta \} \}$ is denoted as feasible region boundary. In general, more than one constraint f can be applied in a design problem, hence the feasible region will be the intersection of the ones corresponding to each different f.

In this framework, computer-aided design (CAD) simulations are often involved to evaluate if the constraints are satisfied, leading to feasible design configurations. However, given the complexity of modern engineering systems, CAD simulations can be both time- and resource-intensive. For instance, an electromagnetic simulator or a computational fluid dynamic solver can take hours [8] or even days [9] to perform a single evaluation of f. Hence, it is of paramount importance to limit the number of (expensive) evaluations during the design phase, while also being able to provide a large number of feasible designs with high accuracy. In this scenario, active learning [10, 11] (or adaptive sampling^{Footnote 1} [12,13,14]) is an elegant tool to solve this problem. Active learning (AL) is also known as the sequential experimental design method in statistics [15]: it sequentially extends the data set to explore the design space in a sample-efficient way, to reduce uncertainty with respect to certain properties. Such desired properties can either be the maximum location of a function, which some examples are shown in Bayesian Optimization (or surrogate-based optimization) [16, 17], the overall accuracy of regression model [18], or a feasible region satisfying all constraints. The last case is considered in our study.

In this paper, we introduce an information theoretic-based active learning framework to make accurate feasibility predictions within a bounded design space. To reach an efficient termination of the active learning process, a new framework of termination criterion, which we alternatively refer to as stopping criterion, is proposed to provide a probabilistic belief of the metamodel’s prediction accuracy. We theoretically investigate the relations between the proposed stopping criterion and the test accuracy metrics commonly used for binary classification problem. The performance of our novel method are validated through comparison with respect to state-of-the-art methods on synthetic experiments. Additionally, two real-world design problems, including one involving multiple design constraints, are also presented.

This paper is structured as follows. First, the background and related techniques are described in Sect. 2. The novel active learning framework, including the active learning method and the framework of the stopping criteria, is introduced in Sect. 3. The numerical experiments and real-world applications are presented in Sect. 4. Conclusions are summed up in Sect. 5.

2 Background and related work

The generic active learning framework for feasible region identification problems is shown in Fig. 1. Given the initial samples, the corresponding function values are evaluated by expensive black box functions. Subsequently, a metamodel is fitted to the data. Next, a new sample of the design parameters is selected according to some specified criterion built upon the model (infill criterion or acquisition function). The key characteristics of the active learning are twofold: the model provides a description of the feasible region, and one can utilize the model prediction to formulate an acquisition function, which provides a measure how informative an unevaluated sample can be. Then, the acquisition function is used to choose the next sample to evaluate. The model is updated accordingly to provide a new (and more accurate) description of the feasible region. This process is repeated until a stopping criterion is met. Finally, the feasible region is accurately described by the metamodel.

2.1 Metamodels for characterizing feasibility

A metamodel, or surrogate model, is a numerical approximation of the black box constraint function. Making feasibility predictions with metamodels is a common strategy. This problem has a close relation with reliability analysis [19], where the computation complexity is mainly from the statistical estimation of f in Eq. (1). In other contexts, this problem is also widely studied [20] and is referred to as excursion set [21, 22] and level set [23, 24] estimation, where one wants to predict if an unevaluated observation’s outcome has surpassed a specified level. From Eq. (1), a unified view can be given by regarding level set estimation, excursion search and feasible region identification as the same problem.^{Footnote 2}

To make accurate predictions, several approaches have been proposed in literature [5, 13, 25, 26]. These methodologies are regression-based, rather than classification-based, since f is a real-valued continuous function in many relevant engineering applications. We refer [11, 27, 28] for classification-based methods in case of f is discrete-valued.

2.2 Active feasible region identification

Once the metamodel has been constructed, active learning will improve the confidence in predicting the feasible region. Several techniques have been presented in literature to actively identify the feasible region. An early one actively selects queries to learn the boundaries that divide a function domain into regions where the function is above or below a given threshold [26]. A heuristic acquisition function has also been proposed to encourage sampling near the feasible region boundary, while having large uncertainty at the same time. A weighted integration mean square error ($\text {IMSE}_{T}$) strategy is proposed by Picheny [25] to measure the belief of the metamodel about the accuracy of the feasible region. A new sampling technique is then developed, based on the fact that model variance is not affected by the function response of new data. Several stepwise uncertainty reduction (SUR) strategies for the excursion set estimation problem have also been developed in [20], which employ sequential sampling to minimize the estimation error or uncertainty, measured by Bayesian risk. Some SUR methods’ consistency property have also been established recently [29]. A variant of Probability of Feasibility (PoF) [30] has been proposed in [31], where a predicting variance term has been heuristically added to encourage exploring within the design space. Since most of the works assume the presence of the feasible region within a bounded design space, an active expansion sampling technique is introduced in [11] for unbounded design space problem.

2.3 Stopping criterion

Considering the active learning framework in Fig. 1, it is fundamental to define a metric to stop the iterative refinement of the metamodel.

To define such a stopping criterion, it is necessary to assess when the feasible region is accurately characterized by the metamodel, i.e., if the metamodel can accurately predict if a design configuration is feasible or not. As different performance metrics have been used for regression and classification problems, first the proper problem type must be determined. The goal is to predict whether a data is within the feasible region: $I_A(f(\cdot ))\rightarrow \{0, 1\}$. This can be formulated as a binary classification problem, thus evaluated by measurements for classification metrics such as the F1 score:

$$\begin{aligned} \text {F}_1 = 2 \frac{\text {precision} \cdot \text {recall}}{\text {precision} + \text {recall}} \end{aligned}$$

(2)

where precision and recall are defined as:

$$\begin{aligned}&\text {precision} = \frac{\text {true positives}}{\text {true positives} + \text {false positives}} \end{aligned}$$

(3)

$$\begin{aligned}&\text {recall} = \frac{\text {true positives}}{\text {true positives} + \text {false negatives}}. \end{aligned}$$

(4)

Either precision, recall or F1 score can be used as a performance measurement, with each one having its own merits. Precision (Eq. 3) describes to what extent the predicted feasible region actually corresponds to a region satisfying the feasibility criteria, nevertheless, it lacks information of indicating whether all the feasible regions in the design space have be captured by the model. On the other hand, the recall metric (Eq. 4) provides more emphasis on to what extent all the feasible regions have been captured by the model, though it provides no indication when the models incorrectly predicts infeasible regions as feasible ones. The F1 score, defined as the harmonic mean of precision and recall, provides a good balance for considering all these circumstances. This metric has been widely used in several applications [5, 11, 32] and is particularly suitable in our scenario.

Unfortunately, the direct application of the F1 score (as well as the precision and recall) is costly because of the requirement of suitable test data. Thus, common AL methods [5, 11, 19] use a maximum number of function evaluations as the stopping criterion. However, it is not possible to estimate upfront how many function evaluation are necessary to obtain an accurate estimate of the feasible region: adopting a predefined maximum number of function evaluations can lead to an inaccurate estimation of the feasible region or to perform too many (unnecessary) expensive evaluations of the function f. To circumvent this problem, a common approach is to determine the stopping criteria purely based on the metamodel. Perhaps the most relevant criterion is the IMSE$_T$ proposed in [25], in which the Kriging predicted variance is weighted by a probability specifying how likely a point is in the feasible region, and then averaged within the design space. The IMSE$_T$ metric presents as elegant way to measure model’s belief about its accuracy within the feasible region. However, the use of this unscaled metric in reality still depends on expert knowledge to judge if the averaged variance is tolerable or not.

3 Active feasible region identification with stopping criterion

The proposed novel AL framework is presented in this section, where we first assume that the constraint functions are continuous-valued like in [5, 33] and can be described via Gaussian process regression. Then, the acquisition function is presented in (Sect. 3.2) and the new stopping criterion is proposed in (Sect. 3.3).

The main workflow of our active learning consists of the following steps:

1.
Use design of experiments (DOE) techniques to select an initial set of design samples to evaluate.
2.
In each subsequent iteration:
1. (a)
  Build a GP regression model on the evaluated samples (Sect. 3.1)
2. (b)
  Check if the stopping criterion has been satisfied (Sect. 3.3), if yes then proceed to step 3, otherwise continue.
3. (c)
  Select a new sample to be evaluated by maximizing the acquisition function (Sect. 3.2)
4. (d)
  Evaluate the new sample, augment the existing samples and go back to step 2(a)
3.
Exit the AL process and return the final metamodel for prediction.

The proposed framework is illustrated in Fig. 2.

3.1 Gaussian process regression

There are many regression models that can be used to cheaply approximate the simulator, such as Bayesian neural networks, support vector regression, and random forest regression. Nevertheless, under the category of data-efficient machine learning, Gaussian process (GP), also known as Kriging [34, 35], is arguably the most prevalent probabilistic technique. GPs have been widely used as a regression method in many different areas including electronics engineering [31], computational fluid dynamics [36] and chemistry [37].

A GP can be considered as a distribution over functions $f: {\mathcal {X}} \rightarrow {\mathbb {R}}$ that is completely defined by a suitable mean function $m: {\mathcal {X}} \rightarrow {\mathbb {R}}$ and covariance function $k: {\mathcal {X}} \times {\mathcal {X}} \rightarrow {\mathbb {R}}$. Starting from the GP prior and the training data $D_n$, it is possible to derive a posterior belief of the function distribution aggregated at (possibly multiple) test data ${\varvec{X}}_*$. This belief is Gaussian distributed:

$$\begin{aligned} {\varvec{Y}}_* \vert D_n, {\varvec{X}}_* \sim {\mathcal {N}}(K_{*n}^\mathrm{{T}}(K_{nn} + \sigma ^2 I)^{-1} {\varvec{y}}, K_{**} - K_{n*}^\mathrm{{T}}(K_{nn} + \sigma ^2 I)^{-1} K_{n*}) \end{aligned}$$

(5)

where ${\varvec{X}}_*$ denotes the (possible multiple) test point, $K_{nn}$ is the kernel matrix between training samples $D_n$, $K_{n*}$ is the kernel matrix between the training samples and the test point and $\sigma ^2$ is the variance of the Gaussian likelihood, which usually represents the noise level of training samples. Note that, Eq. (5) also reveals that sampling a path of a Gaussian process based on discrete observations is equivalent with sampling a known multivariate Gaussian distribution; this property will be heavily utilized for the proposed stopping criterion. The interested reader is referred to [38] for a detailed description of GPs.

Once the GP metamodel has been built, it can be utilized to make feasibility predictions. Since the GP posterior prediction is a Gaussian distribution, we use its mean to make the feasibility predictions, i.e., the feasible region is modeled by:

$$\begin{aligned} A_\mu := \{{\varvec{x}} \in {\mathcal {X}} : I_A(\mu ({\varvec{x}}))=1\} \end{aligned}$$

(6)

Note that the use of the GP posterior mean as the feasibility predictor also embeds our assumption that, in case there is a misclassification of a design configuration’s feasibility, the cost of either wrongly predicting it as feasible (false positive) or wrongly predicting it as infeasible (false negative) is equivalent from a decision theoretic perspective [38]. We refer to [21] in case these two costs are not equivalent.

3.2 Acquisition function based on uncertainty reduction

The information-theoretic based acquisition function presented in [5, 39, 40] is adopted. The main intuition behind it is to sample the point that maximizes the expected information gain of an interval g, given the new data ${\varvec{x}}$. This is quantified by the reduction of differential entropy:

$$\begin{aligned} \alpha _g({\varvec{x}}) = {\mathbb {H}}(p(g \vert D_n)) - {\mathbb {E}}_{p(f_{{\varvec{x}}}\vert D_n, {\varvec{x}})}\left[ {\mathbb {H}}(p(g \vert D_n \cup \{{\varvec{x}}, f_{{\varvec{x}}}\}))\right] \end{aligned}$$

(7)

where $p(f_{{\varvec{x}}}\vert D_n, {\varvec{x}})$ denotes the predictive response distribution at ${\varvec{x}}$, which in case of a GP is a Gaussian distribution, ${\mathbb {E}}(\cdot )$ denotes the operation of expectation through possible outcomes of $f_{{\varvec{x}}}$, while ${\mathbb {H}}(\cdot )$ denotes the differential entropy, which can be used as an aggregated representation of a probability distribution’s uncertainty.

However, the calculation of Eq. (7) is intractable, since the GP posterior must be updated to calculate $p(g \vert D_n \cup \{{\varvec{x}}, f_{{\varvec{x}}}\})$ for a certain location ${\varvec{x}}$. To alleviate the computational burden, a common alternative representation [41] arises by noting that Eq. (7) actually represents the conditional mutual information: $\text {I}\{g, f_{{\varvec{x}}} \vert D_n, {\varvec{x}}\}$. Hence, Eq. (7) can be reformulated as its symmetric form:

$$\begin{aligned} \alpha _g({\varvec{x}}) = {\mathbb {H}}(p(f_{{\varvec{x}}} \vert D_n,{\varvec{x}})) - {\mathbb {E}}_{p(g \vert D_n )}\left[ {\mathbb {H}}(p(f_{{\varvec{x}}} \vert D_n, {\varvec{x}}, g))\right] \end{aligned}$$

(8)

Let us suppose that the feasible boundary is $[\alpha , \beta ]$; the acquisition function is then formulated to maximize the expected information gain about the region of interest and its complement^{Footnote 3}

$$\begin{aligned} \begin{aligned} \alpha ({\varvec{x}})&= \sum _{i=1}^{3}\alpha _{g_i}({\varvec{x}}) \\&= 3{\mathbb {H}}(p(f_{{\varvec{x}}} \vert D_n, {\varvec{x}})) - {\mathbb {H}}(p(f_{{\varvec{x}}} \vert D_n, {\varvec{x}}, f_{{\varvec{x}}}<\alpha )) \\&- {\mathbb {H}}(p(f_{{\varvec{x}}}\vert D_n, {\varvec{x}}, \alpha<f_{{\varvec{x}}}<\beta )) \\&- {\mathbb {H}}(p(f_{{\varvec{x}}}\vert D_n, {\varvec{x}}, \beta < f_{{\varvec{x}}})) \end{aligned} \end{aligned}$$

(9)

with intervals defined as:

$$\begin{aligned} g_1=\{-\infty<f_{{\varvec{x}}}< \alpha \},\quad g_2=\{\alpha<f_{{\varvec{x}}}< \beta \},\quad g_3=\{\beta<f_{{\varvec{x}}} < \infty \} \end{aligned}$$

(10)

Now, computing Eq. (9) only involves calculating the differential entropy of a truncated Gaussian distribution. In particular, the following closed form expression holds [5]:

$$\begin{aligned} \begin{aligned} \alpha _{g_2}({\varvec{x}}) =&{\mathbb {H}}(p(f_{{\varvec{x}}} \vert D_n, {\varvec{x}})) - {\mathbb {H}}(p(f_{{\varvec{x}}}\vert D_n, {\varvec{x}}, \alpha<f_{{\varvec{x}}}<\beta )) \\ =&- \text {log}\;(Z) \\&-\frac{1}{2Z} \left[ (\alpha - \mu ({\varvec{x}})) {\mathcal {N}}(\mu ({\varvec{x}}) \vert \alpha , \sigma ^2) - (\beta - \mu ({\varvec{x}})){\mathcal {N}}(\mu ({\varvec{x}}) \vert \beta , \sigma ^2) \right] \end{aligned} \end{aligned}$$

(11)

where $Z=\varPhi \left[ \frac{\beta - \mu ({\varvec{x}})}{\sigma ({\varvec{x}})}\right] - \varPhi \left[ \frac{\alpha - \mu ({\varvec{x}})}{\sigma ({\varvec{x}})}\right]$, and $\varPhi$ denotes the cumulative distribution function of the standard normal distribution.

One advantage of this information-theoretic based acquisition function is that it can be conveniently extended to handle multiple constraints problem. Since the derivation of this condition is omitted in [5], we briefly demonstrate it here. Consider the following multi-constraint problem: ${\varvec{x}}\rightarrow {\varvec{f}}$, where the desired feasible region must satisfy all k black box constraints:

$$\begin{aligned} I_A({\varvec{f}}({\varvec{x}}))= {\left\{ \begin{array}{ll} 1&{} \alpha _1 \le f_1({\varvec{x}}) \le \beta _1, \alpha _2 \le f_2({\varvec{x}}) \le \beta _2,\ldots ,\alpha _k \le f_k({\varvec{x}}) \le \beta _k \\ 0&{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(12)

by making use of the mutual information $I({\varvec{g}}; {\varvec{f}} \vert {\varvec{x}}, \varvec{D_n})$, the acquisition function can be formulated as:

$$\begin{aligned} \begin{aligned} \alpha _{{\varvec{g}}}({\varvec{x}})&= {\mathbb {H}}(p({\varvec{f}}_{{\varvec{x}}} \vert \varvec{D_n}, {\varvec{x}})) - {\mathbb {E}}_{p({\varvec{g}}\vert \varvec{D_n}, {\varvec{x}})}\left[ {\mathbb {H}}(p({\varvec{f}}_{{\varvec{x}}} \vert \varvec{D_n} \cup \{{\varvec{x}}, {\varvec{g}}\}))\right] \\&= \frac{1}{2} \text {ln det} (2\pi e\varSigma ({\varvec{x}})) - {\mathbb {H}}(p({\varvec{f}}_{{\varvec{x}}} \vert \varvec{D_n} \cup \{{\varvec{x}}, {\varvec{g}}\}))\\ \end{aligned} \end{aligned}$$

(13)

$\varSigma ({\varvec{x}})$ is the covariance matrix between each one dimensional f. Assuming the independence between the different components of ${\varvec{f}}$, Eq. (11) can be written as:

$$\begin{aligned} \begin{aligned} \alpha _{{{\varvec{g}}}_{{\varvec{2}}}}({\varvec{x}}) =&\frac{1}{2} {\text {ln}} (2\pi e)^k\ \text {det} (\varSigma ({\varvec{x}})) - \sum _{j=1}^{k}{\mathbb {H}}(p(f_j \vert D_n \cup \{{\varvec{x}}, g_{j_2}\})) \quad \\ =&\sum _{j=1}^k \frac{1}{2} {\text {ln}} (2\pi e)\sigma _{j}^2 - \sum _{j=1}^{k}\left[ \frac{1}{2} {\text {ln}} (2\pi e \sigma _{j}^2Z_{j}^2) + \right. \\&\left. \frac{1}{2Z_j}[(\alpha _j - \mu _j){\mathcal {N}}(\mu _j \vert \alpha _j, \sigma _{j}^2) - (\beta _j - \mu _j){\mathcal {N}}(\mu _j \vert \beta _j, \sigma _{j}^2 )]\right] \\ =&\sum _{j=1}^k \alpha _{g_{j_2}}({\varvec{x}}) \end{aligned} \end{aligned}$$

(14)

Note that, for simplicity, the symbol ${\varvec{x}}$ is sometimes omitted in Eq. (14).

Once the acquisition function is defined, it can be used in the learning loop (see Fig. 2). Indeed, at each iteration, the GP metamodel is refined with a new expensive simulation, performed for the maximizer $\bar{{\varvec{x}}}$ of the acquisition function. To estimate $\bar{{\varvec{x}}}$, the following procedure is adopted: the acquisition function is evaluated at 1000 candidate points chosen via Monte Carlo sampling, then the best candidate point is chosen as the starting point of an L-BFGS-B gradient based optimizer [42]. This optimization process can be performed very efficiently, since the acquisition function $\alpha ({\varvec{x}})$ depends only on the GP model, which is very cheap to evaluate.

Let us remark that this information theoretic-based acquisition function has demonstrated competitive performance compared with existing active learning techniques for feasible region identification. The interested reader is referred to [5] for a detailed discussion on this topic.

3.3 Framework of metamodel-based stopping criterion

An accurate estimation of the precision, recall and F1 score requires to evaluate f for a large set of samples in the design space. Since evaluating f via simulations is expensive, it is not possible to directly use these metrics as stopping criterion of the proposed active learning strategy. In the following, it is first described how these metrics can be efficiently estimated from a Bayesian perspective, i.e., from the metamodel’s belief. Then, these estimations can be adopted as stopping criteria of the AL process (see Fig. 2). For notation and introduction simplicity, we start with the description of the stopping criterion in case of single constraint function (k = 1), and then we generalize it to multiple constraints.

For a problem defined by a single constraint $f: {\mathcal {R}}^{m \times d} \rightarrow {\mathcal {R}}^{m \times 1}$, the corresponding feasible region is $A_f:=\{{\varvec{x}} \in {\mathcal {X}}: \alpha \le f({\varvec{x}}) \le \beta \}$, and the feasible region predicted by the GP metamodel (with posterior mean $\mu : {\mathcal {R}}^{m \times d} \rightarrow {\mathcal {R}}^{m \times 1}$) is $A_\mu :=\{{\varvec{x}} \in {\mathcal {X}}: \alpha \le \mu ({\varvec{x}})\le \beta \}$. Furthermore, let us indicate with $\text {precision}_\mu \vert f, \alpha , \beta , X^*$ the precision of the metamodel identifying feasible region $A_\mu$, compared with the actual feasible region $A_f$ based on test data $X^*$. Let us use $\text {recall}_\mu \vert f, \alpha , \beta , X^*$ and $\text {F1}_\mu \vert f, \alpha , \beta , X^*$ to indicate recall and F1 score estimated by the metamodel, respectively. These three performance measurements depend on the given feasible boundary $[\alpha$, $\beta ]$ and test data $X^*: \{X_F^*, X_{IF}^*\} \in {\mathcal {R}}^{n, d}$, where $X_F^*$ and $X_{IF}^*$ denotes predicted feasible and not feasible data by posterior mean $\mu$, respectively, m can be the size of the test data, while d represents the problem dimensionality.

Let us further define a set operator $I_F [C] = \{c \in C | \alpha<c<\beta \}$, where $C \subset R^{m \times 1}$, which returns the subset of elements satisfying the constraints. Let us define a similar operator for the metamodel $I_{A_\mu }(D)= D\cap A_\mu$, which returns the subset of elements of the input data $D \in {\mathcal {R}}^{m \times d}$ that satisfy the feasible region.

With slight abuse of the notations given above, the precision, recall and F1 score can then be represented as:

$$\begin{aligned} \text {precision}_\mu \vert f, \alpha , \beta , X^*&= \frac{\vert I_{F}[f[I_{A_\mu }(X^*)])] \vert }{\vert I_F [\mu [X^*]] \vert } = \frac{\vert I_{F}[f[X_F^*]]\vert }{\vert X_F^*\vert } \end{aligned}$$

(15)

$$\begin{aligned} \text {recall}_\mu \vert f, \alpha , \beta , X^*&= \nonumber \\&\frac{\vert I_{F}[f[I_{A_\mu }(X^*)])] \vert }{\vert I_{F}[f[I_{A_\mu }(X^*)])] \vert + \vert I_{F}[f[I_{{\mathcal {X}}\backslash A_\mu }(X^*)]] \vert } ' \nonumber \\&= \frac{\vert I_{F}[f[X_F^*]]\vert }{ \vert I_{F}[f[X^*]]\vert } \end{aligned}$$

(16)

$$\begin{aligned} \text {F1}_\mu \vert f, \alpha , \beta , X^*&= 2 \frac{{\text {precision}}_\mu \cdot {\text {recall}}_\mu }{{\text {precision}}_\mu + {\text {recall}}_\mu }\nonumber \\&= 2\frac{\vert I_{F}[f[X_F^*]]) \vert }{\vert I_{F}[f[X^*]]) \vert + \vert X_F^* \vert } \end{aligned}$$

(17)

where $\vert . \vert$ denotes the number of rows, for instance, $\vert X^* \vert = n$.

These three performance metrics are difficult to compute, as a large number of samples needs to be evaluated by the expensive simulator as test data. To efficiently make use of these metrics, a Bayesian approach can be utilized to describe the expensive constraint f in Eqs. (15–17) with the belief of the GP metamodel: f follows a generative approach from the GP posterior. In this framework, Eqs. (15–17) become random variables.

First, we will present an analytical analysis of the precision metric under our setting. Specifically, we provide an unbiased analytical estimation of its mean and an analytical approximation of its variance.

Proposition 1

Given the assumption that the black box function is a random sample path of a known GP posterior: $f \sim {{\mathcal {GP}}}$ . We have:

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_f(\text {precision}_\mu \vert f, \alpha , \beta , X^*) \\&= \frac{1}{N_F}\sum _{i=1}^{N_F} \left[ \varPhi \left( \frac{\beta -\mu ({\varvec{x}}_{F_i}^*)}{\sigma ({\varvec{x}}_{F_i}^*)}\right) -\varPhi \left( \frac{\alpha -\mu ({\varvec{x}}_{F_i}^*)}{\sigma ({\varvec{x}}_{F_i}^*)}\right) \right] \\&= {\mathbb {E}}_f \left[ \frac{1}{N_F}\sum _{i=1}^{N_F} I_{F}[f({\varvec{x}}_{F_i})]\right] \approx {\mathbb {E}}_f\left[ V(A_{f \subseteq \mu })/V(A_\mu )\right] \end{aligned} \end{aligned}$$

(18)

where $N_F$ is the size of the predicted feasible data set $X_F^*$, i.e., $\vert X_F^* \vert$. ${\varvec{x}}_{F_i}^*$ denotes a specific point i in $X_F^*$, $V(\cdot )$ denotes the volume of a region, $A_{f \subseteq \mu }$ denotes the feasible region within bounded space $A_\mu$

A generalized proof of proposition 1 (which is applicable in case of multiple constraints) is given in Appendix (Sect. 1.1). Proposition 1 explains three properties of the Bayesian treatment of the precision metric. First and foremost, the precision metric formulated by comparing the posterior mean and GP posterior that contains the black box function f is a random variable, and its expectation equals the second line of Eq. (18), which provides an unbiased analytical expression of this expectation. Second, the last line of Eq. (18) reveals the connection of the precision metric with the excursion search method [20], which tries to provide an accurate estimation of the probability of failure within a given design space ${\mathcal {X}}$. This brings an intuitive geometric interpretation of this equation: with the increase of the test data, the analytical expression will eventually converge to the expected feasible volume ratio within the model predict feasible space $A_\mu$. Third, based on the second property, the last line of Eq. (18) also provide a justification that one can use any test data when applying the metamodel, with Eq. (18) still as a meaningful reference even though it is originally conditioned on probably different test data ${\varvec{X}}^*$. Since as long as these two datasets are condense distributed in the design space, applying Eq. (18) on different test data sets approximate the same geometric interpretation.

We now demonstrate an approximation of the variance of the precision metric, a generalized proof of which is given in Appendix (Sect. 1.2).

Proposition 2

Based on the same assumptions and notations of proposition 1 . The variance of the precision metric can be approximated as:

$$\begin{aligned} \begin{aligned}&{\mathbb {V}}_f(\text {precision}_\mu \vert f, \alpha , \beta , X^*) \approx \frac{1}{N_{F}^2}\sum _{i=1}^{N_F} \left[ W(\varvec{{\varvec{x}}_{F_i}^*})- W(\varvec{{\varvec{x}}_{F_i}^*})^2\right] \end{aligned} \end{aligned}$$

(19)

where $W({\varvec{x}}_{F_i}^*):=\left[ \varPhi \left( \frac{\beta -\mu ({\varvec{x}}_{F_i}^*)}{\sigma ({\varvec{x}}_{F_i}^*)}\right) -\varPhi \left( \frac{\alpha -\mu ({\varvec{x}}_{F_i}^*)}{\sigma ({\varvec{x}}_{F_i}^*)}\right) \right]$ , the equality can be taken when the conditional independence of GP posterior prediction holds, i.e., $\mathrm{{cov}}({\varvec{Y}}_* \vert D_n, {\varvec{X}}_*)$ is diagonal.

The benefits of using Bayesian perspective to describe the precision metric (as well as the recall and F1 score) is twofold. First, the Bayesian description of f by the GP posterior has circumvented the use of precalculated expensive test data. Second, the reformulation of the performance metric as random variables has also allowed the calculation of its variance, which is of practical interest since its confidence bound can be extracted later.

Unfortunately, similar analytical forms do not exist for the recall and F1 score metric. Hence, to make use of these metrics as stopping criteria, following the same Bayesian treatment, one can sample the possible f from the GP posterior to calculate the possible F1 scores, as shown in Algorithm 1. This new algorithm, referred to as F1-sample, is used in this paper to describe to what extent the feasible region has been modeled.

In case the feasible region is defined by multiple (k in total) black box constraints: $A := \{{\varvec{x}} \in {\mathcal {X}} : \bigcap \nolimits _{j=1}^{k} \alpha _j< f_j({\varvec{x}})<\beta _j\}$, a histogram of sampled F1 score could be evaluated for each one of the constraints. However, this does not provide an overall accuracy of the prediction model.

Hence, an aggregated F1 score is proposed in the following. We first introduce some additional definitions to properly describe the criterion for multi-constraint problems from a practical implementation perspective. Let us define a set operator $I_{{\varvec{A}}_\mu }(D)= D\cap _{i=j}^{k} A_{\mu _j}$, which returns the subset of elements of input data $D \in {\mathcal {R}}^{m \times d}$ that are in the joint feasible region. Let $I_{F_j}' [\cdot ]: {\mathbb {R}}^{m} \rightarrow \{0, 1\}^m$ denote whether function values are within a feasible region j, while the operator $\vert \cdot \vert _1$ returns the element number of 1. Assuming the same bounded design space ${\mathcal {X}}$ for all functions and using the same test data $X^*$ for each of the constraints f, then the predictions of these models can be aggregated into a single final model by reformulating the F1 score as a random variable:

$$\begin{aligned} \begin{aligned} \text {F1}_{\varvec{\mu }} ({\varvec{f}}) \vert \varvec{\alpha }, \varvec{\beta }, X^* = 2\frac{ \vert \prod _{j=1}^k I_{F_j}' [f_j(I_{{\varvec{A_\mu }}}[{X^*}])]) \vert _1}{\vert \prod _{j=1}^k I_{F_j}' [f_j[X^*]]) \vert _1 + \vert X_F^* \vert } \end{aligned} \end{aligned}$$

(20)

As this expression cannot be trivially decomposed, we can also sample its distribution as in Algorithm 2.

The recall metric can also be sampled by following a similar procedure. Note that the proposed strategy is also applicable to any other suitable probabilistic models, while in this research, we restrict our discussion within the framework of Gaussian process.

Since the proposed F1-sample measure heavily depends on sampling from the Gaussian process defined in Eq. (5), it is also worthwhile to describe the computational complexity of calculating the proposed stopping criterion. For the general multi-constraint case, the computational complexity^{Footnote 4} in case of exact sampling (A. 2 of [38]) is ${\mathcal {O}}(n^3Nk)$, where $n=|X^*|$ represents test sample numbers, N is the path sample number of GP posterior and k denotes the number of constraints. More efficient F1 score estimation is a topic of future research, we refer to [44,45,46,46] for further discussion.

To utilize the samples from Algorithm 1 and 2 as an indicator of the metamodels’ prediction accuracy, the sampled F1 score (or recall) can be averaged as the unbiased estimator of Eq. (20’s) mean, which is referred to as $\overline{\text {F1-sample}}$. Meanwhile, as the F1-sample measure relies on the assumption that the real function f follows the generation of a GP (Proposition 1), in reality this assumption can (possibly) result in either overestimation or underestimation of the real F1 score. Thus, to provide a more robust estimation, several different strategies can be adapted. For instance, instead of using a fixed test data set, it can be randomly generated for each F1 sample. On the otherhand, one can employ a fully Bayesian approach by placing a prior distribution on the kernel hyperparameters and sample the posterior GP to estimate the F1 score. Moreover, one can take a step further by sampling from a mixture of possible kernels (e.g., [47]). Nevertheless, such approaches will result in much more extra computational effort within the current AL framework. Hence, in order to efficiently make use of the proposed F1-sample metric as a stopping criterion, as shown in Fig. 2, a conservative approach can be adopted by investigating when the metamodel is 95% confident that its predicted F1 score is larger than an acceptable F1 score, which can be obtained from the 5th percentile of the F1-sample as a point estimation. Once this condition is continuously reached for several iterations (e.g., 5 iterations are used in this paper), it is suitable to stop the AL iterations.

4 Numerical examples and application examples

First synthetic experiments are conducted to validate the $\overline{\text {F1-sample}}$ metric, i.e., to investigate to what extend it is able to capture the real F1 score. Afterwards, applications to Electronic Design Automation (EDA) examples are presented, where the feasible region identification is of practical interest during design space exploration. In particular, an antenna and a distributed filter are studied, both operating in the microwave spectrum. Different design constraints are applied to their frequency response, simulated using ADS Momentum [48]. In the first EDA example, the feasible region is defined by a single constraint condition on the device Return Loss (RL). In the second EDA example, a feasible region determined by multiple constraints must be identified, based on the filter operating bandwidth. Both geometrical parameters and material properties are used as design parameters. Microwave designers can find feasible solutions for different manufacturing processes or layout techniques.

All the experiments are conducted on a computer with Intel(R) Core(TM) i7-8650u @ 1.9GHz processor and 16 GB RAM. The AL framework is implemented using GPFlowOpt [49]. Regarding the GP hyperparameters, the mean function m is chosen equal to zero, while the automatic relevance detection (ARD) [38] version of the squared exponential (SE) kernel is employed. The kernel hyperparameters are determined using maximum likelihood estimation with an L-BFGS-B optimizer. For the test data $X^*$ of F1-sample metric, we use 10,000 random samples. Then, to calculate the F1-sample score, $N=500$ function samples are drawn from the posterior GP. Finally, 20 experiments are conducted independently to provide robust results.

4.1 Numerical examples

The numerical experiments are conducted on seven different benchmark functions, described in Table 1 along with their feasible regions. To give an indication of the difficulty of the problems studied here, the feasible ratio and the feasible variance ratio are also reported in Table 1. The feasible region ratio indicates the ratio between the volumes of the feasible region and the total bounded design space, and it is defined as:

$$\begin{aligned} R_\mathrm{{fr}} = V(A)/ V({\mathcal {X}})\ (\%), \end{aligned}$$

(21)

where the operator $V(\cdot )$ calculates the volume of a specified region. The feasible variance ratio is the function variance within the feasible region over the variance in the design spaces, and is defined as:

$$\begin{aligned} \begin{aligned}&R_\mathrm{{fv}} = \varDelta \{f \vert A\}/\varDelta \{f \vert {\mathcal {X}}\} \ (\%)\\&\text {where}\ \varDelta \{f \vert {\mathcal {X}}\} = \underset{{\varvec{x}} \in {\mathcal {X}}}{\text {max}} f - \underset{{\varvec{x}} \in {\mathcal {X}}}{\text {min}} f\\&\quad \quad \ \ \ \varDelta \{f \vert A\} = \underset{{\varvec{x}} \in A}{\text {max}} f - \underset{{\varvec{x}} \in A}{\text {min}} f \end{aligned} \end{aligned}$$

(22)

Table 1 Benchmark functions settings

Full size table

4.1.1 Assessment of stopping criterion on random distributed data

The accuracy of the estimated F1 score by $\overline{\text {F1-sample}}$ is assessed by training a one-shot GP model on random data sets with an increasing number of samples T. Adopting random samples to build the GP, makes it difficult to accurately model the feasible region, especially in cases when the feasible ratio is small: it might happen that no point is located in the feasible region. This allows a fair evaluation of the F1-sample score even when the metamodel is less confident. The F1-sample score is compared against the deterministic F1 score estimated on a random test data set of 10,000 samples.

As a comparison, we also investigate the performance of the IMSE$_T$, which has widely been applied as an accuracy measurement of the feasible region within the design space:

$$\begin{aligned} \text {IMSE}_T = \int _{\mathcal {X}} s_k^2({\varvec{x}}) W({\varvec{x}})\mathrm{{d}}{\varvec{x}} \end{aligned}$$

(23)

where the so-called interval weight: $W({\varvec{x}}) = \varPhi (\frac{\beta - \mu ({\varvec{x}})}{\sigma ({\varvec{x}})}) - \varPhi (\frac{\alpha - \mu ({\varvec{x}})}{\sigma ({\varvec{x}})})$ (Eq. 21 of [25]) is the probability that a response ${\varvec{x}}$ is inside the interval.

To evaluate the performance of the metric estimation, two criteria will be considered. First, a good prediction accuracy measurement should follow the same trend as the test data, based on a suitable evaluation metric, which is the F1 score in our setting. This can be retrieved from the linear correlation between the stopping criterion and the F1 score. Moreover, the deviation of the estimation should be sufficiently small with respect to the F1 score, since either overestimating or underestimating the F1 score will provide misleading results to designers.

Table 2 Benchmark results of random sampling

Full size table

Results in the form of linear correlation are provided in Table 2. The deviation with respect to the F1 score is also provided in Fig. 3. This progress plot is constructed by linearly interpolating the F1 score versus the F1 score predicted by the two metrics. Since the metamodel is built by random sampling, all the experiments are independent, hence the F1 score prediction is only reported with its mean. One immediate advantage which can be seen here is that the IMSE$_T$ metric, while also measures the metamodel’s accuracy, cannot be used to describe the F1 score directly, as it is unscaled and negatively correlated with the F1 score. Thus a scaled IMSE$_T$ is used here for fair comparison and denoted as $\text {IMSE}_T^+:=1 - \text {IMSE}_T/\max (\text {IMSE}_T)$. Note that, this comparison is not performed for the Curved benchmark function due to the small volume of its feasible region, which requires an unaffordable number of samples to provide an accurate metamodel. For all the other benchmark functions, it can be shown that the proposed $\overline{\text {F1-sample}}$ clearly provides a better correlation with the F1 score than the $\text {IMSE}_T^+$. Taking a look at the deviation progress plot in Fig. 3, both stopping criteria converge when the model accuracy improves, i.e. when the F1 score approaches to 1. Nevertheless, $\overline{\text {F1-sample}}$ has a smaller deviation for almost all the processes.

4.1.2 Assessment of stopping criterion in active learning

To further demonstrate the advantage of F1-sample in the AL schemes, experiments for active feasible region identification are conducted, now based on the proposed entropy-based acquisition function. The initial data are generated by max–min latin hypercube design (LHD). The experiments starting and stopping sample numbers, as well as the corresponding results are reported in Table 3. For some benchmark functions like the Goldenstein-Price function, low correlation is shown by both metrics. The main reason is that the feasible region is especially difficult to model in this case, as the function variance is considerably larger than the specified feasible region variance. Additionally, in earlier stages the metamodel accuracy is not high enough for the criterion to closely follow the F1 score trend, which changes along with the learning steps. This issue will gradually disappears as the model accuracy grows.

As a complementary benchmark, a multi-constraint experiment is executed. Since there is no trivial extension of IMSE$_T$ to this setting it is not compared here and reported as not applicable (NA). Results with random sampling and active learning are also provided in Table 3 and Fig. 4. A high correlation can be observed consistently.

Table 3 Benchmark results of active learning

Full size table

4.2 Design of a microstrip patch antenna

We now demonstrate the usage of the proposed AL framework in the electronic design of a microstrip antenna, which consists of a metal patch on a dielectric substrate, with a ground metal layer on the backside side. The patch geometry is shown in Fig. 5, while the design parameters can be found in Table 4. The feed line for the signal is matched with a 50 $\Omega$ impedance at 2.4 GHz. The antenna is simulated in the range [2–3] GHz with ADS Momentum, and a typical simulation requires about 45 seconds. Thus, the objective function is the frequency for which the Return Loss (RL) is minimal. If this value is included in [2.35, 2.45] GHz, the design is considered feasible. The mathematical notation is shown as follows:

$$\begin{aligned} \begin{aligned}&\text {Find}\ A = \{{\varvec{x}} \in {\mathcal {X}} \vert I_A(f({\varvec{x}})) =1 \}\\&\text {where}\ I_A(f({\varvec{x}}))= {\left\{ \begin{array}{ll} 1&{} 2.35 \le f({\varvec{x}}) \le 2.45 \\ 0&{} \text {otherwise} \end{array}\right. }\\&\quad \quad \quad f({\varvec{x}}):= \underset{\zeta \in [2,3]}{\text {argmin}}({\text {RL}}_{{\varvec{x}}}(\zeta )) \end{aligned} \end{aligned}$$

(24)

where ${\text {RL}}_{{\varvec{x}}}(\cdot )$ denotes the RL curve for a given design ${\varvec{x}}$, $f({\varvec{x}})$ denotes the RL curves minimum between frequency $\zeta$ range [2, 3] GHz.

Table 4 Microstrip patch antenna design parameters range

Full size table

An initial GP model is built using 30 initial samples, which are generated by LHD. The progress plot of the predicted F1 score is shown in Fig. 6. It can be seen that the F1-sample keeps increasing with the number of iterations. The sampled F1 score histogram with its 5th percentile are also shown at three different stage of the AL process. By applying the proposed stopping criterion, the six dimensional AL process has stopped in 71 iterations, and the whole AL process lasts about 62 minutes.

To evaluate the final learned GP’s performance, 3000 antenna configurations are randomly generated in the design space and the final GP metamodel is adopted to predict their feasibility. Only 372 have been predicted as feasible. We evaluated these predicted feasible designs performance by the ADS Momentum simulator, the results of RL curves are shown in Fig. 7. The final learned GP metamodel has a precision score of 98.12%, a F1 score of 96.69%.

4.3 Design of a double-folded microstrip band-stop filter

The second application example is a double-folded microstrip band-stop filter [53], formed by a dielectric substrate between a top metallization and a bottom ground plane. As illustrated in Fig. 8, the geometry of the upper layer consists of two stubs, with identical length and spacing, folded onto the two sides of a transmission line. Here, the proposed AL framework can be applied to identify all possible parametric configurations for which the band-stop presents the desired width and location. Hence, the feasible region is defined by the following conditions:

$$\begin{aligned}&\mathrm{{BW}} \le 7\,\text {GHz}; \quad 13.5 \le f_0 \le 14.5\, \text {GHz} \end{aligned}$$

(25)

$$\begin{aligned}&\mathrm{{BW}} = f_{H} - f_{L} ; \quad f_0 = \frac{f_H + f_L}{2} \end{aligned}$$

(26)

where BW is the bandwidth and $f_0$ is the central frequency. The filter response is computed at 101 frequency points over the frequency range [5–25] GHz. Then, BW and $f_0$ can be defined w.r.t. the higher and lower band-stop frequencies ($f_H$, $f_L$), which correspond to − 3 dB of magnitude response.

Table 5 Double-folded microstrip band-stop filter design parameters range

Full size table

The design space has four parameters as shown in Table 5, and the evaluation of a single design configuration takes about 42 s. The initial GP model is built upon 10 initial samples from a LHD and updated by the proposed entropy search-based method. The corresponding progress plot are depicted in Fig. 9, showing a rapid convergence of the F1-sample score for this multi-constraint problem. The AL process is halted when F1-sample is continuously larger than 0.98 for 5 consecutive iterations. The overall AL process takes about 14 minutes.

The accuracy of the computed metamodel is verified by comparison with 1000 ADS simulations. The results are shown in Fig. 10, indicating a precision score of 98.22%, a F1 score of 98.08%.

5 Conclusion

An information-theoretic based active learning method has been introduced to sequentially learn an accurate Gaussian process metamodel for generating feasible design candidates. In particular, feasible regions can be identified with an efficient, robust and automated procedure, where multiple constraints can be specified at the same time. Furthermore, a novel AL stopping criterion has been proposed, based on the probabilistic description of the prediction accuracy. The proposed framework has been applied to two real-world electromagnetic design problems, a six-dimensional design space with a single constraint, and a four-dimensional problem with two constraints. In both applications, the method is able to reach a high-precision representation. Moreover, the number of samples needed to compute the model is minimized thanks to a novel sampling strategy, as well as the stopping criterion which allows for online supervising the metamodel’s belief about the feasible region, thus stopping the algorithm efficiently. This indicates that the model is suitable to simulate the behaviour of the real system on a confined region, and it can be successfully employed to generate a large number of good design candidates in complex engineering problems.

There are still several limitations on proposed algorithm. For the F1-sample itself, the stopping criterion depends on numerical integration in the input domain, which makes it difficult to apply it in high dimensional problems. Moreover, the sampling process used to approximate the distribution requires additional computation. Future research will be focused on overcoming these limitations.

Notes

Note that in this paper the terms “active learning” and “adaptive sampling” are used interchangeably.
The main research of level set estimation is built on discrete data sets, where one tries to efficiently identify all the data of which the outcomes are above a given level. On the other hand, recent research, for instance [23, 24], has mentioned that the problem can be generalized for a continuous domain, which becomes very relevant for feasible region identification.
For notation simplicity, we follow the original formulation in [5] and reuse symbol $\alpha$ and $\beta$ in Eq. (3) with a different domain: $\alpha , \beta \in {\mathbb {R}}$. In practice, it can be trivially applicable in case g is defined on an infinite interval, e.g., $\alpha< f_{{\varvec{x}}} < \infty$.
The computational complexity involving sampling from multivariate normal distribution depends mainly on the (e.g. Cholesky) decomposition of the $n \times n$ covariance matrix (a detailed derivation can be found for instance at A. 4 of [43]). Nevertheless, given a fixed test data location $X^*$ (i.e., a fixed covariance matrix), this decomposition only need to be calculate once upfront and can be reused to generate different samples with ${\mathcal {O}}(n^2)$.

References

Bradner E, Iorio F, Davis M (2014) Parameters tell the design story: ideation and abstraction in design optimization. In: Proceedings of the Symposium on Simulation for Architecture & Urban Design (SimAUD’14), vol 26, pp 1–8. Society for Computer Simulation International, San Diego, CA, USA
Zhou Q, Wu J, Xue T, Jin P (2021) A two-stage adaptive multi-fidelity surrogate model-assisted multi-objective genetic algorithm for computationally expensive problems. Eng Comput 37:623–639
Article Google Scholar
Ceruti A (2019) Meta-heuristic multidisciplinary design optimization of wind turbine blades obtained from circular pipes. Eng Comput 35(2):363–379
Article Google Scholar
Gaier A, Asteroth A, Mouret J-B (2018) Data-efficient design exploration through surrogate-assisted illumination. Evol Comput 26(3):381–410
Article Google Scholar
Knudde N, Couckuyt I, Shintani K, Dhaene T (2019) Active learning for feasible region discovery. In: 18th IEEE international conference on machine learning and applications (ICMLA), pp 567–572. Boca Raton, FL, USA. https://doi.org/10.1109/ICMLA.2019.00106
Cully A, Demiris Y (2017) Quality and diversity optimization: a unifying modular framework. IEEE Trans Evol Comput 22(2):245–259
Article Google Scholar
Koos S, Mouret J-B, Doncieux S (2012) The transferability approach: crossing the reality gap in evolutionary robotics. IEEE Trans Evol Comput 17(1):122–145
Article Google Scholar
Yuepeng B, Wenping S, Zhonghua H, Zhang Y, Zhang L (2020) Aerodynamic/aeroacoustic variable-fidelity optimization of helicopter rotor based on hierarchical kriging model. Chin J Aeronaut 33(2):476–492
Article Google Scholar
Song P, Sun J, Wang K (2014) Axial flow compressor blade optimization through flexible shape tuning by means of cooperative co-evolution algorithm and adaptive surrogate model. Proc Inst Mech Eng Part A J Power Energy 228(7):782–798
Article Google Scholar
Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145
Article Google Scholar
Chen W, Fuge M (2018) Active expansion sampling for learning feasible domains in an unbounded input space. Struct Multidiscip Optim 57(3):925–945
Article MathSciNet Google Scholar
Larson BJ, Mattson CA (2012) Design space exploration for quantifying a system model’s feasible domain. ASME J Mech Des 134(4):041010. https://doi.org/10.1115/1.4005861
Article Google Scholar
Zhuang X, Pan R (2012) A sequential sampling strategy to improve reliability-based design optimization with implicit constraint functions. ASME J Mech Des 134(2):021002. https://doi.org/10.1115/1.4005597
Article Google Scholar
Lee TH, Jung JJ (2008) A sampling technique enhancing accuracy and efficiency of metamodel-based RBDO: constraint boundary sampling. Comput Struct 86(13–14):1463–1476
Article Google Scholar
Chen J, Kang L, Lin G (2019) Gaussian process assisted active learning of physical laws. ArXiv preprint arXiv:1910.03120
Hernández-Lobato JM, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Proceedings of the 27th international conference on neural information processing systems - vol 1, pp 918–926 (NIPS’14). MIT Press, Cambridge, MA, USA
Wang Z, Jegelka S (2017) Max-value entropy search for efficient Bayesian Optimization. In: Proceedings of the 34th international conference on machine learning - vol 70, pp 3627–3635 (ICML’17). JMLR.org
Zhou Q, Wang Y, Choi S-K, Jiang P, Shao X, Hu J (2017) A sequential multi-fidelity metamodeling approach for data regression. Knowl Based Syst 134:199–212
Article Google Scholar
Song K, Zhang Y, Zhuang X, Yu X, Song B (2020) An adaptive failure boundary approximation method for reliability analysis and its applications. Eng Comput 1–16. https://doi.org/10.1007/s00366-020-01011-0
Bect J, Ginsbourger D, Li L, Picheny V, Vazquez E (2012) Sequential design of computer experiments for the estimation of a probability of failure. Stat Comput 22(3):773–793
Article MathSciNet Google Scholar
Azzimonti D, Ginsbourger D, Chevalier C, Bect J, Richet Y (2021) Adaptive design of experiments for conservative estimation of excursion sets. Technometrics 63(1):13–26. https://doi.org/10.1080/00401706.2019.1693427
Article MathSciNet Google Scholar
Marco A, von Rohr A, Baumann D, Hernández-Lobato JM, Trimpe S (2020) Excursion search for constrained bayesian optimization under a limited budget of failures. ArXiv preprint arXiv:2005.07443
Bogunovic I, Scarlett J, Krause A, Cevher V (2016) Truncated variance reduction: a unified approach to Bayesian optimization and level-set estimation. In: Proceedings of the 30th international conference on neural Information processing systems, pp 1515–1523 (NIPS’16). Curran Associates Inc., Red Hook, NY, USA
Senadeera M, Rana S, Gupta S, Venkatesh S (2020) Level set estimation with search space warping. In: Lauw H, Wong RW, Ntoulas A, Lim EP, Ng SK, Pan S (eds) Advances in knowledge discovery and data mining. PAKDD 2020. Lecture Notes in Computer Science, vol 12085. Springer, Cham. https://doi.org/10.1007/978-3-030-47436-2_62
Picheny V, Ginsbourger D, Roustant O, Haftka RT, Kim N (2010) Adaptive designs of experiments for accurate approximation of a target region. ASME J Mech Des 132(7):071008. https://doi.org/10.1115/1.4001873
Article Google Scholar
Bryan B, Nichol RC, Genovese CR, Schneider J, Miller CJ, Wasserman L (2006) Active learning for identifying function threshold boundaries. In: Proceedings of the 18th international conference on neural information processing systems, pp 163–170 (NIPS’05). MIT Press, Cambridge, MA, USA
Singh P, Van Der Herten J, Deschrijver D, Couckuyt I, Dhaene T (2017) A sequential sampling strategy for adaptive classification of computationally expensive data. Struct Multidiscip Optim 55(4):1425–1438
Article MathSciNet Google Scholar
van der Herten J, Couckuyt I, Deschrijver D, Dhaene T (2016) Adaptive classification under computational budget constraints using sequential data gathering. Adv Eng Softw 99:137–146
Article Google Scholar
Bect J, Bachoc F, Ginsbourger D et al (2019) A supermartingale approach to gaussian process based sequential design of experiments. Bernoulli 25(4A):2883–2919
Article MathSciNet Google Scholar
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York
Book Google Scholar
Kaintura A, Foss K, Couckuyt I, Dhaene T, Zografos O, Vaysset A, Sorée B (Dec 2018) Machine learning for fast characterization of magnetic logic devices. In: 2018 IEEE electrical design of advanced packaging and systems symposium (EDAPS), pp 1–3. Chandigarh, India. https://doi.org/10.1109/EDAPS.2018.8680898
Thai DK, Tu TM, Bui TQ et al (2021) Gradient tree boosting machine learning on predicting the failure modes of the RC panels under impact loads. Eng Comput 37:597–608. https://doi.org/10.1007/s00366-019-00842-w
Rahat A, Wood M (2020) On Bayesian search for the feasible space under computationally expensive constraints. ArXiv preprint arXiv:2004.11055
Qian J, Yi J, Cheng Y, Liu J, Zhou Q (2020) A sequential constraints updating approach for kriging surrogate model-assisted engineering optimization design problem. Eng Comput 36:993–1009
Article Google Scholar
Kaintura A, Spina D, Couckuyt I, Knockaert L, Bogaerts W, Dhaene T (2017) A kriging and stochastic collocation ensemble for uncertainty quantification in engineering applications. Eng Comput 33(4):935–949
Article Google Scholar
Qing J, Hu Y, Wang Y, Liu Z, Fu X, Liu W (2019) Kriging Assisted Integrated Rotor-Duct Optimization for Ducted Fan in Hover. In: AIAA Scitech 2019 Forum. San Diego, USA, AIAA 2019-0007
Meyer R, Hauser AW (2020) Geometry optimization using gaussian process regression in internal coordinate systems. J Chem Phys 152(8):084112
Article Google Scholar
Rasmussen CE (2004) Gaussian processes in machine learning. In: Bousquet O, von Luxburg U, Rätsch G (eds) Advanced lectures on machine learning. ML 2003. Lecture notes in computer science, vol 3176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28650-9_4
Qing J, Knudde N, Couckuyt I, Spina D, Dhaene T (2020) Bayesian active learning for electromagnetic structure design. In: 14th European conference on antennas and propagation (EuCAP), pp 1–5. Copenhagen, Denmark. https://doi.org/10.23919/EuCAP48036.2020.9136051
Qing J, Knudde N, Couckuyt I, Dhaene T, Shintani K (2020) Batch Bayesian active learning for feasible region identification by local penalization. To appear in: 2020 Winter Simulation Conference (WSC). IEEE
Houlsby N, Huszar F, Ghahramani Z, Hernández-Lobato JM (2012) Collaborative gaussian processes for preference learning. In: Proceedings of the 25th international conference on neural information processing systems - vol 2, pp 2096–2104 (NIPS’12). Curran Associates Inc., Red Hook, NY, USA
Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw (TOMS) 23(4):550–560
Article MathSciNet Google Scholar
Perrone V, Shcherbatyi I, Jenatton R, Archambeau C, Seeger M (2019) Constrained Bayesian optimization with max-value entropy search. ArXiv preprint arXiv:1910.07003
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS’07), pp 1177–1184. Curran Associates Inc., Red Hook, NY, USA
Wilson JT, Borovitskiy V, Terenin A, Mostowsky P, Deisenroth MP (2020) Efficiently sampling functions from gaussian process posteriors. ArXiv preprint arXiv:2002.09309
Wang K, Pleiss G, Gardner J, Tyree S, Weinberger KQ, Wilson AG (2019) Exact gaussian processes on a million data points. In: Advances in neural information processing systems. pp 14648–14659
Wilson A, Adams R (2013) Gaussian process kernels for pattern discovery and extrapolation. In: Proceedings of the 30th international conference on international conference on machine learning - vol 28, pp 1067–1075 (ICML’13). JMLR.org
Agilent EEsof E Advanced design system (ads). Agilent Technologies Inc, Santa Rosa, CA (USA)
Knudde N, van der Herten J, Dhaene T, Couckuyt I (2017) Gpflowopt: a Bayesian optimization library using tensorflow. ArXiv preprint arXiv:1711.03845
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626
Article Google Scholar
Dette H, Pepelyshev A (2010) Generalized latin hypercube design for computer experiments. Technometrics 52(4):421–429
Article MathSciNet Google Scholar
Molga M, Smutnicki C (2005) Test functions for optimization needs. Test functions for optimization needs 101
Spina D, Ferranti F, Dhaene T, Knockaert L, Antonini G, Vande Ginste D (2012) Variability analysis of multiport systems via polynomial-chaos expansion. IEEE Trans Microw Theory Tech 60:2329–2338
Article Google Scholar
Gupta SS (1963) Probability integrals of multivariate normal and multivariate t1. Ann Math Stat 34(3):792–828. https://doi.org/10.1214/aoms/1177704004
Article MATH Google Scholar

Download references

Acknowledgements

This work is partially supported by the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme, and Chinese Scholarship Council under the Grant number 201906290032.

Author information

Authors and Affiliations

Department of Information Technology (INTEC), IDLab, Ghent University-imec, Technologiepark-Zwijnaarde 15, 9052, Ghent, Belgium
Jixiang Qing, Nicolas Knudde, Federico Garbuglia, Domenico Spina, Ivo Couckuyt & Tom Dhaene

Authors

Jixiang Qing
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Knudde
View author publications
You can also search for this author in PubMed Google Scholar
Federico Garbuglia
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Spina
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Couckuyt
View author publications
You can also search for this author in PubMed Google Scholar
Tom Dhaene
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jixiang Qing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of proposition 1

Here, we directly prove a generalized proposition, which is applicable to multiple constraint problems; then, we show that Proposition 1 is a special case of this general proposition. We first introduce the notation of precision metric in multiple constraints problems, by following the same notation of Eq. (20), it can be defined as:

$$\begin{aligned} \text {precision}_{\varvec{\mu }} \vert {\varvec{f}}, \varvec{\alpha }, \varvec{\beta }, X^* = \frac{ \vert \prod _{j=1}^k I_{F_j}' [f_j(I_{{\varvec{A_\mu }}}[{X^*}])]) \vert _1}{\vert X_F^* \vert } \end{aligned}$$

(27)

Proposition 3

Given the assumption that each black box constraint function is a random sample path of a known GP posterior: $\forall f_j \in {\varvec{f}}: f_j \sim \mathcal {GP}_j$ , if $\forall f_i, f_j \in {\varvec{f}}$ , $\forall {\varvec{x}} \in {\mathcal {X}} \subset {\mathbb {R}}^d$ : $\mathrm{{cov}}(f_i({\varvec{x}}), f_j({\varvec{x}})) = 0$ , then the following analytical expectation exists:

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}_{{\varvec{f}}}(\text {precision}_\mu \vert {\varvec{f}}, \varvec{\alpha }, \varvec{\beta }, X^*) \\&= \frac{1}{N_F}\sum _{i=1}^{N_F} \left[ \prod _{j=1}^{k}\left[ \varPhi \left( \frac{\beta _j-\mu _{j}({\varvec{x}}_{F_i}^*)}{\sigma _{j}({\varvec{x}}_{F_i}^*)}\right) -\varPhi \left( \frac{\alpha _{j}-\mu _{j}({\varvec{x}}_{F_i}^*)}{\sigma _{j}({\varvec{x}}_{F_i}^*)}\right) \right] \right] \\&= {\mathbb {E}}_{{\varvec{f}}} \left[ \frac{1}{N_F} \sum _{i=1}^{N_F} \left[ \prod _{j=1}^k I_{F}[f({\varvec{x}}_{F_i})] \right] \right] \approx {\mathbb {E}}_{{\varvec{f}}}\left[ V(A_{{\varvec{f}} \subseteq \varvec{\mu }})/V(A_{\varvec{\mu }})\right] \end{aligned} \end{aligned}$$

(28)

where $N_F$ is the size of the predicted feasible data set $X_F^*$ , i.e., $\vert X_F^* \vert$ . ${\varvec{x}}_{F_i}^*$ denotes a specific point i in $X_F^*$ , $V(\cdot )$ denotes the volume of a region, $A_{{\varvec{f}} \subseteq \varvec{\mu }}$ denotes the feasible region within bounded space $A_{\varvec{\mu }}$

Proof

Start with the precision shown in Eq. (27).

While the denominator is a constant, for the numerator, we have:

$$\begin{aligned} \Big \vert \prod _{j=1}^k I_{F_j}' [f_j(I_{{\varvec{A_\mu }}}[{X^*}])]) \Big \vert _1 = \sum _{i=1}^{N_F} \left[ \prod _{j=1}^k I_{F_j}[f_j({\varvec{x}}_{F_i}^*)] \right] \end{aligned}$$

(29)

By taking the expectation of this term w.r.t. ${\varvec{f}}$ and by making use of the linearity of expectation and the independent assumption, it becomes:

$$\begin{aligned} \begin{aligned} {\mathbb {E}}_{{\varvec{f}}}&= \sum _{i=1}^{N_F} {\mathbb {E}}_{{\varvec{f}}} \left[ \prod _{j=1}^{k} I_{F_j}[f_j({\varvec{x}}_{F_i}^*)]\right] \\&= \sum _{i=1}^{N_F} \left[ \prod _{j=1}^{k} \left[ \int _{\alpha _j}^{\beta _j} f({\varvec{x}}_{F_i}^*) \mathrm{{d}}f_{j} \right] \right] \\&= \sum _{i=1}^{N_F} \left[ \prod _{j=1}^{k}\left[ \varPhi \left( \frac{\beta _j-\mu _{j}({\varvec{x}}_{F_i}^*)}{\sigma _{j}({\varvec{x}}_{F_i}^*)}\right) -\varPhi \left( \frac{\alpha _{j}-\mu _{j}({\varvec{x}}_{F_i}^*)}{\sigma _{j}({\varvec{x}}_{F_i}^*)}\right) \right] \right] \end{aligned} \end{aligned}$$

(30)

which has obtained the second line of the Eq. (28). The last line of Eq. (18) and Eq. (28) can be conveniently obtained by substituting Eq. (29) into Eq. (28) and by taking the expectation, and can be regraded as the Monte Carlo approximation of the feasible ratio defined in the last term. With this proof, it can be seen that Proposition 1 generally holds when $k=1$. $\square$

1.2 Proof of proposition 2

Here, we follow the same strategy adopted in (Appendix 1.1) by introducing a generalized proposition, which is applicable to multiple constraint problems, and then we show that proposition 2 is a special case of this general proposition. We begin with some necessary lemmas.

Lemma 1

Slepian inequality: let $X_1$ , $X_2, \ldots , X_n$ be normal with zero means and covariance matrix $\{\rho _{i,j}\}$ , and let $Y_1, Y_2, \ldots , Y_n$ be normal with zero means and covariance matrix $\{l_{i,j}\}$ . Let $\rho _{i,i}=l_{i, i}$ , $i = 1, 2, \ldots , n$ , If $\rho _{i,i} \ge l_{i,j}, i,j=1,2,\ldots ,n$ , then

$$\begin{aligned} \varPhi (\alpha _1, \alpha _2,\ldots , \alpha _k; \{\rho _{i, j}\}) \ge \varPhi (\alpha _1, \alpha _2,\ldots , \alpha _k; \{l_{i, j}\}) \end{aligned}$$

Proof

The proof of Slepian inequality can be found at [54]. $\square$

Lemma 2

Given the random variables $\{X_1, X_2\}$ , which follow a joint multivariate normal distribution, each of them has the following marginal distribution:

$$\begin{aligned} X_1 \sim {\mathcal {N}}(\mu _1,\sigma _1),\; X_2 \sim {\mathcal {N}}(\mu _2,\sigma _2). \end{aligned}$$

(31)

If ${\text {Cov}}(X_1, X_2)\ge 0$ , then the following inequality holds:

$$\begin{aligned} {\text {Pr}}(\alpha _1 \le X_1 \le \beta _1, \alpha _2 \le X_2 \le \beta _2) \ge {\text {Pr}}(\alpha _1 \le X_1 \le \beta _1 )\cdot {\text {Pr}}(\alpha _2 \le X_2 \le \beta _2), \end{aligned}$$

where $\alpha _1$ , $\alpha _2$ , $\beta _1$ , $\beta _2$ are constants.

Proof

$$\begin{aligned} \begin{aligned}&\text {Pr}(\alpha _1 \le X_1 \le \beta _1, \alpha _2 \le X_2 \le \beta _2) \\&= F_{X_1 X_2}(\beta _1, \beta _2) - F_{X_1 X_2}(\alpha _2, \beta _1) - F_{X_1 X_2}(\alpha _1, \beta _2) + F_{X_1 X_2}(\alpha _1, \alpha _2) \end{aligned} \end{aligned}$$

(32)

Where $F_X(\cdot )$ denotes the cumulative density function of X. Substituting $\{\rho _{i,j}\}$, $\{\tau _{i,i}\}$ in Lemma 1 with the covariance matrix $\mathrm{{cov}}(X_1,X_2)$, and the diagonal matrix of $\mathrm{{cov}}(X_1,X_2)$ respectively. Since $\mathrm{{cov}}(X_1-\mu _1, X_2-\mu _2) = \mathrm{{cov}}(X_1, X_2)$, one can utilize Lemma 1 to have:

$$\begin{aligned} \begin{aligned} \text {Pr}(X_1 \le \beta _1, X_2 \le \beta _2)&= \text {Pr}(X_1-\mu _1 \le \beta _1-\mu _1, X_2-\mu _2 \le \beta _2-\mu _2) \\&\ge \text {Pr} (X_1 -\mu _1 \le \beta _1-\mu _1) \text {Pr} ( X_2 -\mu _2 \le \beta _2-\mu _2)\\&=\text {Pr} (X_1 \le \beta _1) \text {Pr} ( X_2 \le \beta _2) \end{aligned} \end{aligned}$$

(33)

Hence, the following holds:

$$\begin{aligned} \begin{aligned}&\text {Pr}(\alpha _1 \le X_1 \le \beta _1, \alpha _2 \le X_2 \le \beta _2) \\&\ge F_{X_1}(\beta _1)F_{X_2}(\beta _2) - F_{X_1}(\alpha _2)F_{X_2}(\beta _1) - F_{X_1}(\alpha _1)F_{X_2}(\beta _2) + F_{X_1}(\alpha _1)F_{X_2}(\alpha _2) \\&= (F_{X_1}(\beta _1) - F_{X_1}(\alpha _1))(F_{X_2}(\beta _2) - F_{X_2}(\alpha _2)) \\&= \text {Pr}(\alpha _1 \le X_1 \le \beta _1 )\,\text {Pr}(\alpha _2 \le X_2 \le \beta _2) \end{aligned} \end{aligned}$$

(34)

$\square$

Lemma 3

With the same notation of Lemma 2 , if ${\text {Cov}}(X_1, X_2) < 0$ , then the following inequality holds:

$$\begin{aligned} {\text {Pr}}(\alpha _1 \le X_1 \le \beta _1, \alpha _2 \le X_2 \le \beta _2) < {\text {Pr}}(\alpha _1 \le X_1 \le \beta _1 ) {\text {Pr}}(\alpha _2 \le X_2 \le \beta _2), \end{aligned}$$

Proof

The Proof follows the same steps as for Lemma 2. $\square$

Finally, we can show our main proposition.

Proposition 4

With the same assumptions and notation of Proposition 3. the variance of precision defined by multiple constraints can be approximated as:

$$\begin{aligned} \begin{aligned}&{\mathbb {V}}_{{\varvec{f}}}(\text {precision}_{\varvec{\mu }} \vert {\varvec{f}}, \varvec{\alpha }, \varvec{\beta }, X^*) \approx \\&\frac{1}{N_{F}^2}\sum _{i=1}^{N_F} \left[ \prod _{j=1}^{k} \left( W_j({\varvec{x}}_{F_i}^*)(1-W_j({\varvec{x}}_{F_i}^*)) + W_j({\varvec{x}}_{F_i}^*)^2\right) - \prod _{j=1}^{k}W_j({\varvec{x}}_{F_i}^*)^2 \right] \end{aligned} \end{aligned}$$

(35)

where $W_j({\varvec{x}}_{F_i}^*):=\left[ \varPhi \left( \frac{\beta _j-\mu _j({\varvec{x}}_{F_i}^*)}{\sigma _j({\varvec{x}}_{F_i}^*)}\right) -\varPhi \left( \frac{\alpha _j-\mu _j({\varvec{x}}_{F_i}^*)}{\sigma _j({\varvec{x}}_{F_i}^*)}\right) \right]$ , the equality can be taken when the conditional independence of GP posterior prediction holds, i.e., $\forall \text {GP}_j$ : $\mathrm{{cov}}({\varvec{Y}}_{j*} \vert D_n, {\varvec{X}}_*)$ is diagonal.

Proof

$$\begin{aligned} \begin{aligned}&{\mathbb {V}}\Bigg [ \frac{\vert \prod _{j=1}^k I_{F_j}' [f_j(I_{{\varvec{A_\mu }}}[{X^*}])]) \vert _1}{\vert X_F \vert } \Bigg ] \\&= \frac{1}{N_{F}^2} {\mathbb {V}} \Bigg [ \sum _{i=1}^{N_F} \Bigg [ \prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_i}^*\Big )\Big ]\Bigg ] \Bigg ] \\&= \frac{1}{N_{F}^2} \sum _{i_{1}=1}^{N_F} \sum _{i_{2}=1}^{N_F} \mathrm{{cov}} \Bigg [\prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_1}}^*\Big )\Big ], \prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_2}}^*\Big )\Big ]\Bigg ]\\&=\frac{1}{N_{F}^2} \sum _{i=1}^{N_F} {\mathbb {V}} \Bigg [ \prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_i}^*\Big )\Big ] \Bigg ] + \frac{1}{N_{F}^2} \sum _{i_{1}=1}^{N_F} \sum _{i_{2}\ne i_1}^{N_F} \Bigg [{\mathbb {E}}\Bigg (\prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_1}}^*\Big )\Big ]\cdot I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_2}}^*\Big )\Big ]\Bigg ) \\&\quad - {\mathbb {E}} \Bigg (\prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_1}}^*\Big )\Big ]\Bigg ) {\mathbb {E}}\Bigg (\prod _{j=1}^k I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_2}}^*\Big )\Big ]\Bigg )\Bigg )\Bigg ]\\&=\frac{1}{N_{F}^2}\sum _{i=1}^{N_F} \Bigg [\prod _{j=1}^{k} \Bigg ({\mathbb {V}}\Big (I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i}}^*\Big )\Big ]\Bigg ) + {\mathbb {E}} \Big (\Big (I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i}}^*\Big )\Big ]\Big )\Big )^2\Bigg ) - \prod _{j=1}^{k}{\mathbb {E}} \Big (\Big (I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i}}^*\Big )\Big ]\Big )\Big )^2 \Bigg ] \\&\quad + \frac{1}{N_{F}^2} \sum _{i_{1}=1}^{N_F} \sum _{i_{2}\ne i_1}^{N_F} \Bigg [\prod _{j=1}^k {\mathbb {E}} \Bigg ( I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_1}}^*\Big )\Big ]\cdot I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_2}}^*\Big )\Big ]\Bigg ) \\&\quad - \prod _{j=1}^k {\mathbb {E}}\Big (I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_1}}^*\Big )\Big ]\Bigg ){\mathbb {E}}\Big (I_{F_j}\Big [f_j\Big ({\varvec{x}}_{F_{i_2}}^*\Big )\Big ]\Big )\Bigg ] \end{aligned} \end{aligned}$$

(36)

By substituting with $W_j({\varvec{x}}_{F_i}^*):=\left[ \varPhi \left( \frac{\beta _j-\mu _j({\varvec{x}}_{F_i}^*)}{\sigma _j({\varvec{x}}_{F_i}^*)}\right) -\varPhi \left( \frac{\alpha _j-\mu _j({\varvec{x}}_{F_i}^*)}{\sigma _j({\varvec{x}}_{F_i}^*)}\right) \right]$, we have

$$\begin{aligned} \begin{aligned}&{\mathbb {V}}\left[ \frac{\vert \prod _{j=1}^k I_{F_j}' [f_j(I_{{\varvec{A_\mu }}}[{X^*}])]) \vert _1}{\vert X_F \vert } \right] \\&=\frac{1}{N_{F}^2}\sum _{i=1}^{N_F} \left[ \prod _{j=1}^{k} \left( W_j({\varvec{x}}_{F_i}^*)(1-W_j({\varvec{x}}_{F_i}^*)) + W_j({\varvec{x}}_{F_i}^*)^2\right) - \prod _{j=1}^{k}W_j({\varvec{x}}_{F_i}^*)^2 \right] \\&\quad + \frac{1}{N_{F}^2} \sum _{i_{1}=1}^{N_F} \sum _{i_{2}\ne i_1}^{N_F} \left[ \prod _{j=1}^k \left( W_j({\varvec{x}}_{F_{i_1}}^*) \cdot W_j({\varvec{x}}_{F_{i_2}}^*) + \mathrm{{cov}} (I_{F_j}[f_j({\varvec{x}}_{F_{i_1}}^*)], I_{F_j}[f_j({\varvec{x}}_{F_{i_2}}^*)]) \right) \right. \\&\quad \left. - \prod _{j=1}^k W_j({\varvec{x}}_{F_{i_1}}^*)W_j({\varvec{x}}_{F_{i_2}}^*)\right] \end{aligned} \end{aligned}$$

(37)

For the covariance term:

$$\begin{aligned} \begin{aligned} \mathrm{{cov}} (I_{F_j}[f_j({\varvec{x}}_{F_{i_1}}^*)], I_{F_j}[f_j({\varvec{x}}_{F_{i_2}}^*)])&= {\mathbb {E}}(I_{F_j}[f_j({\varvec{x}}_{F_{i_1}}^*)], I_{F_j}[f_j({\varvec{x}}_{F_{i_2}}^*)]) \\&\quad - {\mathbb {E}}(I_{F_j}[f_j({\varvec{x}}_{F_{i_1}}^*)]) \cdot {\mathbb {E}}( I_{F_j}[f_j({\varvec{x}}_{F_{i_2}}^*)]) \\&= \text {Pr}(\alpha _j \le f_j({\varvec{x}}_{F_{i_1}}^*) \le \beta _j, \alpha _j \le f_j({\varvec{x}}_{F_{i_2}}^*) \le \beta _j) \\&\quad - \text {Pr}( \alpha _j \le f_j({\varvec{x}}_{F_{i_1}}^*) \le \beta _j ) \text {Pr}(\alpha _j \le f_j({\varvec{x}}_{F_{i_2}}^*) \le \beta _j) \end{aligned} \end{aligned}$$

(38)

With Lemmas 2 and 3, since the posterior covariance between different points can either be positive or negative, a diagonal assumption of $\mathrm{{cov}}({\varvec{Y}}_{j*} \vert D_n, {\varvec{X}}_*)$ is hence made to eliminate Eq. (38) and provide an analytical approximation of the variance. One can also observe that Proposition 2 can be generally obtained with $k=1$$\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qing, J., Knudde, N., Garbuglia, F. et al. Adaptive sampling with automatic stopping for feasible region identification in engineering design. Engineering with Computers 38 (Suppl 3), 1955–1972 (2022). https://doi.org/10.1007/s00366-021-01341-7

Download citation

Received: 05 October 2020
Accepted: 06 February 2021
Published: 10 March 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00366-021-01341-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adaptive sampling with automatic stopping for feasible region identification in engineering design

Abstract

Similar content being viewed by others

Initial sampling methods in metamodel-assisted optimization

A multi-point sampling method based on kriging for global optimization

A quantile-based sequential approach to reliability-based design optimization via error-controlled adaptive Kriging with independent constraint boundary sampling

1 Introduction

2 Background and related work

2.1 Metamodels for characterizing feasibility

2.2 Active feasible region identification

2.3 Stopping criterion

3 Active feasible region identification with stopping criterion

3.1 Gaussian process regression

3.2 Acquisition function based on uncertainty reduction

3.3 Framework of metamodel-based stopping criterion

Proposition 1

Proposition 2

4 Numerical examples and application examples

4.1 Numerical examples

4.1.1 Assessment of stopping criterion on random distributed data

4.1.2 Assessment of stopping criterion in active learning

4.2 Design of a microstrip patch antenna

4.3 Design of a double-folded microstrip band-stop filter

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of proposition 1

Proposition 3

Proof

1.2 Proof of proposition 2

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Proposition 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation