Abstract
Bayesian Optimization using Gaussian Processes is a popular approach to deal with optimization involving expensive black-box functions. However, because of the assumption on the stationarity of the covariance function defined in classic Gaussian Processes, this method may not be adapted for non-stationary functions involved in the optimization problem. To overcome this issue, Deep Gaussian Processes can be used as surrogate models instead of classic Gaussian Processes. This modeling technique increases the power of representation to capture the non-stationarity by considering a functional composition of stationary Gaussian Processes, providing a multiple layer structure. This paper investigates the application of Deep Gaussian Processes within Bayesian Optimization context. The specificities of this optimization method are discussed and highlighted with academic test cases. The performance of Bayesian Optimization with Deep Gaussian Processes is assessed on analytical test cases and aerospace design optimization problems and compared to the state-of-the-art stationary and non-stationary Bayesian Optimization approaches.
Similar content being viewed by others
References
Amari S-I, Douglas SC (1998) Why natural gradient? In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP’98 (Cat. No. 98CH36181), vol 2. IEEE, pp 1213–1216
Amine Bouhlel M, Bartoli N, Regis RG, Otsmane A, Morlier J (2018) Efficient global optimization for high-dimensional constrained problems by using the kriging models combined with the partial least squares method. Eng Optim 50(12):2038–2053
Atkinson PM, Lloyd CD (2007) Non-stationary variogram models for geostatistical sampling optimisation: an empirical investigation using elevation data. Comput Geosci 33(10):1285–1300
Audet C, Denni J, Moore D, Booker A, Frank P (2000) A surrogate-model-based method for constrained optimization. In 8th symposium on multidisciplinary analysis and optimization, p 4891
Bartoli N, Lefebvre T, Dubreuil S, Olivanti R, Priem R, Bons N, Martins JRRA, Morlier J (2019) Adaptive modeling strategy for constrained global optimization with application to aerodynamic wing design. Aerosp Sci Technol 90:85–102
Basu K, Ghosh S (2017) Analysis of thompson sampling for gaussian process optimization in the bandit setting. arXiv preprint arXiv:1705.06808
Breiman L (2017) Classification and regression trees. Routledge, Abingdon
Bui T, Hernández-Lobato D, Hernandez-Lobato J, Li Y, Turner R (2016) Deep gaussian processes for regression using approximate expectation propagation. In: International conference on machine learning, pp 1472–1481
Cordery I, Yao SL (1993) Non stationarity of phenomena related to drought. Extreme hydrological events. In: Proceedings of the international symposium, Yokohama, 1993, 01 1993
Cox DD, John S (1997) Sdo: a statistical method for global optimization. In: Multidisciplinary design optimization: state-of-the-art, pp 315–329
Dai Z, Damianou A, González J, Lawrence N (2015) Variational auto-encoded deep gaussian processes. arXiv preprint arXiv:1511.06455
Damianou A, Lawrence N (2013) Deep gaussian processes. In: Artificial intelligence and statistics, pp 207–215
de G Matthews AG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, León-Villagrá P, Ghahramani Z, Hensman J (2017) GPflow: a Gaussian process library using TensorFlow. J Mach Learn Res 18(40):1–6
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York
Frazier PI (2018) A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811
Garg S, Singh A, Ramos F (2012) Learning non-stationary space-time models for environmental monitoring. In: Twenty-sixth AAAI conference on artificial intelligence, Toronto
Gibbs MN (1998) Bayesian Gaussian processes for regression and classification. PhD thesis, University of Cambridge
Gramacy RB, Apley DW (2015) Local gaussian process approximation for large computer experiments. J Comput Gr Stat 24(2):561–578
Gramacy RB, Lee HKH (2008) Bayesian treed gaussian process models with an application to computer modeling. J Am Stat Assoc 103(483):1119–1130
Gray JS, Hwang JT, Martins JRRA, Moore KT, Naylor BA (2019) OpenMDAO: an open-source framework for multidisciplinary design, analysis, and optimization. Struct Multidiscip Optim 59:1075–1104
Haas TC (1990) Kriging and automated variogram modeling within a moving window. Atmos Environ Part A Gen Top 24(7):1759–1769
Havasi M, Hernández-Lobato JM, Murillo-Fuentes JJ (2018) Inference in deep gaussian processes using stochastic gradient Hamiltonian Monte Carlo. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates Inc, New York, pp 7506–7516
Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835
Hernández-Lobato JM, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp 918–926
Higdon D, Swall J, Kern J (1999) Non-stationary spatial modeling. Bayesian Stat 6(1):761–768
Hoffman MD, Brochu E, de Freitas N (2011) Portfolio allocation for Bayesian optimization. In: UAI. Citeseer, pp 327–336
Huang W, Zhao D, Sun F, Liu H, Chang E (2015) Scalable gaussian process regression using deep neural networks. In: Twenty-fourth international joint conference on artificial intelligence
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Konda S (2006) Fitting models of nonstationary time series: an application to EEG data. PhD thesis, Case Western Reserve University
Krityakierne T, Ginsbourger D (2015) Global optimization with sparse and local gaussian process models. In: International workshop on machine learning, optimization and big data. Springer, Berlin, pp 185–196
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Marmin S, Ginsbourger D, Baccou J, Liandrat J (2018) Warped gaussian processes and derivative-based sequential designs for functions with heterogeneous variations. SIAM/ASA J Uncertain Quantif 6(3):991–1018
Milly PCD, Betancourt J, Falkenmark M, Hirsch RM, Kundzewicz ZW, Lettenmaier DP, Stouffer RJ (2008) Stationarity is dead: Whither water management? Science 319(5863):573–574
Močkus J (1975) On Bayesian methods for seeking the extremum. In: Optimization techniques IFIP technical conference. Springer, Berlin, pp 400–404
Paciorek CJ, Schervish MJ (2006) Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5):483–506
Papoulis A, Unnikrishna P (1991) Probability, random variables and stochastic processes. Tata McGraw-Hill Education, New York
Parr JM, Keane AJ, Forrester AIJ, Holden CME (2012) Infill sampling criteria for surrogate-based optimization with constraint handling. Eng Optim 44(10):1147–1166
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626
Picheny V, Gramacy RB, Wild S, Le Digabel S (2016) Bayesian optimization under mixed constraints with a slack-variable augmented lagrangian. In: Advances in neural information processing systems, pp 1435–1443
Powell MJD (2009) The Bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, pp 26–46
Powell MJD (2003) On trust region methods for unconstrained minimization without derivatives. Math Program 97(3):605–623
Priem R, Bartoli N, Diouane Y (2019) On the use of upper trust bounds in constrained Bayesian optimization infill criteria. In: AIAA aviation 2019 forum, p 2986
Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans Evol Comput 13(2):398–417
Rasmussen CE, Ghahramani Z (2002) Infinite mixtures of gaussian process experts. In: Advances in neural information processing systems, pp 881–888
Rasmussen C, Williams CKI (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge
Remes S, Heinonen M, Kaski S (2017) Non-stationary spectral kernels. In: Advances in neural information processing systems, pp 4642–4651
Salimbeni H, Deisenroth M (2017) Doubly stochastic variational inference for deep gaussian processes. In: Advances in neural information processing systems, pp 4588–4599
Salimbeni H, Eleftheriadis S, Hensman J (2018) Natural gradients in practice: non-conjugate variational inference in gaussian process models. In: Artificial intelligence and statistics
Sampson P, Guttorp PD (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87(417):108–119
Sasena MJ (2002) Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. PhD thesis, University of Michigan Ann Arbor, MI
Sasena MJ, Papalambros PY, Goovaerts P (2001) The use of surrogate modeling algorithms to exploit disparities in function computation time within simulation-based optimization. Constraints 2:5
Schonlau M, Welch WJ, Jones D (1996) Global optimization with nonparametric function fitting. In: Proceedings of the ASA, section on physical and engineering sciences, pp 183–186
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175
Shahriari B, Wang Z, Hoffman MW, Bouchard-Côté A, de Freitas N (2014) An entropy search portfolio for Bayesian optimization. arXiv preprint arXiv:1406.4625
Snelson E, Ghahramani Z (2006) Sparse gaussian processes using pseudo-inputs. In: Advances in neural information processing systems, pp 1257–1264
Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M, Prabhat MR, Adams R (2015) Scalable bayesian optimization using deep neural networks. In: International conference on machine learning, pp 2171–2180
Snoek J, Swersky K, Zemel R, Adams R (2014) Input warping for Bayesian optimization of non-stationary functions. In: International conference on machine learning, pp 1674–1682
Titsias M (2009) Variational learning of inducing variables in sparse gaussian processes. In: Artificial intelligence and statistics, pp 567–574
Toal DJJ, Keane AJ (2012) Non-stationary kriging for design optimization. Eng Optim 44(6):741–765
Viana FAC, Haftka RT, Watson LT (2013) Efficient global optimization algorithm assisted by multiple surrogate techniques. J Glob Optim 56(2):669–689
Vidakovic B (2009) Statistical modeling by wavelets, vol 503. Wiley, New York
Wang G, Shan S (2007) Review of metamodeling techniques in support of engineering design optimization. J Mech Des 129(4):370–380
Watson AG, Barnes RJ (1995) Infill sampling criteria to locate extremes. Math Geol 27(5):589–608
Wild SM, Regis RG, Shoemaker CA (2008) Orbit: optimization by radial basis function interpolation in trust-regions. SIAM J Sci Comput 30(6):3197–3219
Xiong Y, Chen W, Apley D, Ding X (2007) A non-stationary covariance-based kriging method for metamodelling in engineering design. Int J Numer Methods Eng 71(6):733–756
Acknowledgements
This work is co-funded by ONERA-The French Aerospace Lab and Université de Lille, in the context of a joint PhD thesis. Discussions with Hugh Salimbeni and Zhenwen Dai were very helpful for this work, special thanks to them. The Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Functions
Modified Xiong function (Fig. 22):
Modified TNK constraint function (Fig. 23):
10d Trid function (Fig. 24):
Hartmann-6d function (Fig. 25):
with:
and
and
Appendix B: Experimental setup
-
All experiments were executed on Grid’5000 using a Tesla P100 GPU. The code is based on GPflow (de G Matthews et al. 2017) and Doubly-Stochastic-DGP (Salimbeni and Deisenroth 2017).
-
For all DGPs, RBF kernels are used with a length-scale and variance initialized to 1 if it does not get an initialization from a previous DGP. The data is scaled to have a zero mean and a variance equal to 1.
-
The Adam optimizer is set with \(\beta _1=0.8\) and \(\beta _2=0.9\) and a step size \(\gamma ^{adam}=0.01\).
-
The natural gradient step size is initialized for all layers at \(\gamma ^{nat}=0.1\)
-
For BO with DGP the number of successive updates before optimizing from scratch is 5.
-
The infill criteria are optimized using a parallel differential evolution algorithm with a population of 400 and 100 generations.
-
A Github repository featuring BO & DGP algorithm will be available after the publication of the paper.
Rights and permissions
About this article
Cite this article
Hebbal, A., Brevault, L., Balesdent, M. et al. Bayesian optimization using deep Gaussian processes with applications to aerospace system design. Optim Eng 22, 321–361 (2021). https://doi.org/10.1007/s11081-020-09517-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-020-09517-8