Bayesian optimization using deep Gaussian processes with applications to aerospace system design

Hebbal, Ali; Brevault, Loïc; Balesdent, Mathieu; Talbi, El-Ghazali; Melab, Nouredine

doi:10.1007/s11081-020-09517-8

Bayesian optimization using deep Gaussian processes with applications to aerospace system design

Research Article
Published: 12 June 2020

Volume 22, pages 321–361, (2021)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Ali Hebbal^1,2,
Loïc Brevault¹,
Mathieu Balesdent¹,
El-Ghazali Talbi² &
…
Nouredine Melab²

2091 Accesses
26 Citations
1 Altmetric
Explore all metrics

Abstract

Bayesian Optimization using Gaussian Processes is a popular approach to deal with optimization involving expensive black-box functions. However, because of the assumption on the stationarity of the covariance function defined in classic Gaussian Processes, this method may not be adapted for non-stationary functions involved in the optimization problem. To overcome this issue, Deep Gaussian Processes can be used as surrogate models instead of classic Gaussian Processes. This modeling technique increases the power of representation to capture the non-stationarity by considering a functional composition of stationary Gaussian Processes, providing a multiple layer structure. This paper investigates the application of Deep Gaussian Processes within Bayesian Optimization context. The specificities of this optimization method are discussed and highlighted with academic test cases. The performance of Bayesian Optimization with Deep Gaussian Processes is assessed on analytical test cases and aerospace design optimization problems and compared to the state-of-the-art stationary and non-stationary Bayesian Optimization approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Deep Gaussian process for multi-objective Bayesian optimization

Article 21 July 2022

Bayesian Optimisation for Objective Functions with Varying Smoothness

AVEI-BO: an efficient Bayesian optimization using adaptively varied expected improvement

Article 18 May 2022

References

Amari S-I, Douglas SC (1998) Why natural gradient? In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP’98 (Cat. No. 98CH36181), vol 2. IEEE, pp 1213–1216
Amine Bouhlel M, Bartoli N, Regis RG, Otsmane A, Morlier J (2018) Efficient global optimization for high-dimensional constrained problems by using the kriging models combined with the partial least squares method. Eng Optim 50(12):2038–2053
Article MathSciNet Google Scholar
Atkinson PM, Lloyd CD (2007) Non-stationary variogram models for geostatistical sampling optimisation: an empirical investigation using elevation data. Comput Geosci 33(10):1285–1300
Article Google Scholar
Audet C, Denni J, Moore D, Booker A, Frank P (2000) A surrogate-model-based method for constrained optimization. In 8th symposium on multidisciplinary analysis and optimization, p 4891
Bartoli N, Lefebvre T, Dubreuil S, Olivanti R, Priem R, Bons N, Martins JRRA, Morlier J (2019) Adaptive modeling strategy for constrained global optimization with application to aerodynamic wing design. Aerosp Sci Technol 90:85–102
Article Google Scholar
Basu K, Ghosh S (2017) Analysis of thompson sampling for gaussian process optimization in the bandit setting. arXiv preprint arXiv:1705.06808
Breiman L (2017) Classification and regression trees. Routledge, Abingdon
Book Google Scholar
Bui T, Hernández-Lobato D, Hernandez-Lobato J, Li Y, Turner R (2016) Deep gaussian processes for regression using approximate expectation propagation. In: International conference on machine learning, pp 1472–1481
Cordery I, Yao SL (1993) Non stationarity of phenomena related to drought. Extreme hydrological events. In: Proceedings of the international symposium, Yokohama, 1993, 01 1993
Cox DD, John S (1997) Sdo: a statistical method for global optimization. In: Multidisciplinary design optimization: state-of-the-art, pp 315–329
Dai Z, Damianou A, González J, Lawrence N (2015) Variational auto-encoded deep gaussian processes. arXiv preprint arXiv:1511.06455
Damianou A, Lawrence N (2013) Deep gaussian processes. In: Artificial intelligence and statistics, pp 207–215
de G Matthews AG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, León-Villagrá P, Ghahramani Z, Hensman J (2017) GPflow: a Gaussian process library using TensorFlow. J Mach Learn Res 18(40):1–6
MathSciNet MATH Google Scholar
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York
Book Google Scholar
Frazier PI (2018) A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811
Garg S, Singh A, Ramos F (2012) Learning non-stationary space-time models for environmental monitoring. In: Twenty-sixth AAAI conference on artificial intelligence, Toronto
Gibbs MN (1998) Bayesian Gaussian processes for regression and classification. PhD thesis, University of Cambridge
Gramacy RB, Apley DW (2015) Local gaussian process approximation for large computer experiments. J Comput Gr Stat 24(2):561–578
Article MathSciNet Google Scholar
Gramacy RB, Lee HKH (2008) Bayesian treed gaussian process models with an application to computer modeling. J Am Stat Assoc 103(483):1119–1130
Article MathSciNet Google Scholar
Gray JS, Hwang JT, Martins JRRA, Moore KT, Naylor BA (2019) OpenMDAO: an open-source framework for multidisciplinary design, analysis, and optimization. Struct Multidiscip Optim 59:1075–1104
Article MathSciNet Google Scholar
Haas TC (1990) Kriging and automated variogram modeling within a moving window. Atmos Environ Part A Gen Top 24(7):1759–1769
Article Google Scholar
Havasi M, Hernández-Lobato JM, Murillo-Fuentes JJ (2018) Inference in deep gaussian processes using stochastic gradient Hamiltonian Monte Carlo. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates Inc, New York, pp 7506–7516
Google Scholar
Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835
Hernández-Lobato JM, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp 918–926
Higdon D, Swall J, Kern J (1999) Non-stationary spatial modeling. Bayesian Stat 6(1):761–768
MATH Google Scholar
Hoffman MD, Brochu E, de Freitas N (2011) Portfolio allocation for Bayesian optimization. In: UAI. Citeseer, pp 327–336
Huang W, Zhao D, Sun F, Liu H, Chang E (2015) Scalable gaussian process regression using deep neural networks. In: Twenty-fourth international joint conference on artificial intelligence
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Article MathSciNet Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Konda S (2006) Fitting models of nonstationary time series: an application to EEG data. PhD thesis, Case Western Reserve University
Krityakierne T, Ginsbourger D (2015) Global optimization with sparse and local gaussian process models. In: International workshop on machine learning, optimization and big data. Springer, Berlin, pp 185–196
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Article Google Scholar
Marmin S, Ginsbourger D, Baccou J, Liandrat J (2018) Warped gaussian processes and derivative-based sequential designs for functions with heterogeneous variations. SIAM/ASA J Uncertain Quantif 6(3):991–1018
Article MathSciNet Google Scholar
Milly PCD, Betancourt J, Falkenmark M, Hirsch RM, Kundzewicz ZW, Lettenmaier DP, Stouffer RJ (2008) Stationarity is dead: Whither water management? Science 319(5863):573–574
Article Google Scholar
Močkus J (1975) On Bayesian methods for seeking the extremum. In: Optimization techniques IFIP technical conference. Springer, Berlin, pp 400–404
Paciorek CJ, Schervish MJ (2006) Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5):483–506
Article MathSciNet Google Scholar
Papoulis A, Unnikrishna P (1991) Probability, random variables and stochastic processes. Tata McGraw-Hill Education, New York
Google Scholar
Parr JM, Keane AJ, Forrester AIJ, Holden CME (2012) Infill sampling criteria for surrogate-based optimization with constraint handling. Eng Optim 44(10):1147–1166
Article Google Scholar
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626
Article Google Scholar
Picheny V, Gramacy RB, Wild S, Le Digabel S (2016) Bayesian optimization under mixed constraints with a slack-variable augmented lagrangian. In: Advances in neural information processing systems, pp 1435–1443
Powell MJD (2009) The Bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, pp 26–46
Powell MJD (2003) On trust region methods for unconstrained minimization without derivatives. Math Program 97(3):605–623
Article MathSciNet Google Scholar
Priem R, Bartoli N, Diouane Y (2019) On the use of upper trust bounds in constrained Bayesian optimization infill criteria. In: AIAA aviation 2019 forum, p 2986
Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans Evol Comput 13(2):398–417
Article Google Scholar
Rasmussen CE, Ghahramani Z (2002) Infinite mixtures of gaussian process experts. In: Advances in neural information processing systems, pp 881–888
Rasmussen C, Williams CKI (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge
MATH Google Scholar
Remes S, Heinonen M, Kaski S (2017) Non-stationary spectral kernels. In: Advances in neural information processing systems, pp 4642–4651
Salimbeni H, Deisenroth M (2017) Doubly stochastic variational inference for deep gaussian processes. In: Advances in neural information processing systems, pp 4588–4599
Salimbeni H, Eleftheriadis S, Hensman J (2018) Natural gradients in practice: non-conjugate variational inference in gaussian process models. In: Artificial intelligence and statistics
Sampson P, Guttorp PD (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87(417):108–119
Article Google Scholar
Sasena MJ (2002) Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. PhD thesis, University of Michigan Ann Arbor, MI
Sasena MJ, Papalambros PY, Goovaerts P (2001) The use of surrogate modeling algorithms to exploit disparities in function computation time within simulation-based optimization. Constraints 2:5
Google Scholar
Schonlau M, Welch WJ, Jones D (1996) Global optimization with nonparametric function fitting. In: Proceedings of the ASA, section on physical and engineering sciences, pp 183–186
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175
Article Google Scholar
Shahriari B, Wang Z, Hoffman MW, Bouchard-Côté A, de Freitas N (2014) An entropy search portfolio for Bayesian optimization. arXiv preprint arXiv:1406.4625
Snelson E, Ghahramani Z (2006) Sparse gaussian processes using pseudo-inputs. In: Advances in neural information processing systems, pp 1257–1264
Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M, Prabhat MR, Adams R (2015) Scalable bayesian optimization using deep neural networks. In: International conference on machine learning, pp 2171–2180
Snoek J, Swersky K, Zemel R, Adams R (2014) Input warping for Bayesian optimization of non-stationary functions. In: International conference on machine learning, pp 1674–1682
Titsias M (2009) Variational learning of inducing variables in sparse gaussian processes. In: Artificial intelligence and statistics, pp 567–574
Toal DJJ, Keane AJ (2012) Non-stationary kriging for design optimization. Eng Optim 44(6):741–765
Article MathSciNet Google Scholar
Viana FAC, Haftka RT, Watson LT (2013) Efficient global optimization algorithm assisted by multiple surrogate techniques. J Glob Optim 56(2):669–689
Article Google Scholar
Vidakovic B (2009) Statistical modeling by wavelets, vol 503. Wiley, New York
MATH Google Scholar
Wang G, Shan S (2007) Review of metamodeling techniques in support of engineering design optimization. J Mech Des 129(4):370–380
Article Google Scholar
Watson AG, Barnes RJ (1995) Infill sampling criteria to locate extremes. Math Geol 27(5):589–608
Article Google Scholar
Wild SM, Regis RG, Shoemaker CA (2008) Orbit: optimization by radial basis function interpolation in trust-regions. SIAM J Sci Comput 30(6):3197–3219
Article MathSciNet Google Scholar
Xiong Y, Chen W, Apley D, Ding X (2007) A non-stationary covariance-based kriging method for metamodelling in engineering design. Int J Numer Methods Eng 71(6):733–756
Article Google Scholar

Download references

Acknowledgements

This work is co-funded by ONERA-The French Aerospace Lab and Université de Lille, in the context of a joint PhD thesis. Discussions with Hugh Salimbeni and Zhenwen Dai were very helpful for this work, special thanks to them. The Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).

Author information

Authors and Affiliations

ONERA, DTIS, Université Paris Saclay, 91123, Palaiseau Cedex, France
Ali Hebbal, Loïc Brevault & Mathieu Balesdent
Université de Lille, CNRS/CRIStAL, Inria Lille, Villeneuve d’Ascq, France
Ali Hebbal, El-Ghazali Talbi & Nouredine Melab

Authors

Ali Hebbal
View author publications
You can also search for this author in PubMed Google Scholar
Loïc Brevault
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Balesdent
View author publications
You can also search for this author in PubMed Google Scholar
El-Ghazali Talbi
View author publications
You can also search for this author in PubMed Google Scholar
Nouredine Melab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Hebbal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Functions

Modified Xiong function (Fig. 22):

$$\begin{aligned} f(x)=-0.5\left( \sin \left( 40(x-0.85)^4\right) \cos \left( 2.5(x-0.95)\right) +0.5(x-0.9)+1\right) , \text { } x \in [0,1] \end{aligned}$$

(18)

Modified TNK constraint function (Fig. 23):

$$\begin{aligned} f({\mathbf{x }})=1.6(x_0-0.6)^2+1.6(x_1-0.6)^2-0.2\cos \left( 20\arctan \left( \frac{0.3x_0}{(x_1+10^{-8})}\right) \right) -0.4, \text { } {\mathbf{x }}\in [0,1]\times [0,1] \end{aligned}$$

(19)

10d Trid function (Fig. 24):

$$\begin{aligned} f({\mathbf{x }})=\sum _{i=1}^{10} (x_i-1)^2-\sum _{i=2}^{10}x_ix_{i-1}, \text { } x_i \in [-100,100], \forall i=1,\dots ,10 \end{aligned}$$

(20)

Hartmann-6d function (Fig. 25):

$$\begin{aligned} f({\mathbf{x }})= \sum _{i=1}^4 \alpha _i \exp \left( -\sum _{j=1}^6 A_{ij} (x_j-Pij)^2 \right) , \text { } x_i\in [0,1], \forall i=1,\dots ,6 \end{aligned}$$

(21)

with:

$$\begin{aligned} \alpha =[1,1.2,3,3.2]^\top \end{aligned}$$

and

$$\begin{aligned} P=10^{-4} \begin{bmatrix} 1312 &{} 1696 &{} 5569 &{} 124 &{} 8283 &{} 5886\\ 2329 &{} 4135 &{} 8307&{}3736&{}1004&{}9991\\ 2348&{}1451&{}3522&{}2883&{}3047&{}6650\\ 4047&{}8828&{}8732&{}5743&{}1091&{}381\end{bmatrix} \end{aligned}$$

and

$$\begin{aligned} A= \begin{bmatrix} 10&{}3&{}17&{}3.5&{}1.7&{}8\\ 0.05&{}10&{}17&{}0.1&{}8&{}14\\ 3&{}3.5&{}1.7&{}10&{}17&{}8\\ 17&{}8&{}0.05&{}10&{}0.1&{}14 \end{bmatrix} \end{aligned}$$

Appendix B: Experimental setup

All experiments were executed on Grid’5000 using a Tesla P100 GPU. The code is based on GPflow (de G Matthews et al. 2017) and Doubly-Stochastic-DGP (Salimbeni and Deisenroth 2017).
For all DGPs, RBF kernels are used with a length-scale and variance initialized to 1 if it does not get an initialization from a previous DGP. The data is scaled to have a zero mean and a variance equal to 1.
The Adam optimizer is set with $\beta _1=0.8$ and $\beta _2=0.9$ and a step size $\gamma ^{adam}=0.01$.
The natural gradient step size is initialized for all layers at $\gamma ^{nat}=0.1$
For BO with DGP the number of successive updates before optimizing from scratch is 5.
The infill criteria are optimized using a parallel differential evolution algorithm with a population of 400 and 100 generations.
A Github repository featuring BO & DGP algorithm will be available after the publication of the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hebbal, A., Brevault, L., Balesdent, M. et al. Bayesian optimization using deep Gaussian processes with applications to aerospace system design. Optim Eng 22, 321–361 (2021). https://doi.org/10.1007/s11081-020-09517-8

Download citation

Received: 13 June 2019
Revised: 22 May 2020
Accepted: 22 May 2020
Published: 12 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11081-020-09517-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian optimization using deep Gaussian processes with applications to aerospace system design

Abstract

Access this article

Similar content being viewed by others

Deep Gaussian process for multi-objective Bayesian optimization

Bayesian Optimisation for Objective Functions with Varying Smoothness

AVEI-BO: an efficient Bayesian optimization using adaptively varied expected improvement

References

Acknowledgements