Skip to main content
Log in

A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This paper considers the efficient minimization of the infinite time average of a stationary ergodic process in the space of a handful of design parameters which affect it. Problems of this class, derived from physical or numerical experiments which are sometimes expensive to perform, are ubiquitous in engineering applications. In such problems, any given function evaluation, determined with finite sampling, is associated with a quantifiable amount of uncertainty, which may be reduced via additional sampling. The present paper proposes a new optimization algorithm to adjust the amount of sampling associated with each function evaluation, making function evaluations more accurate (and, thus, more expensive), as required, as convergence is approached. The work builds on our algorithm for Delaunay-based Derivative-free Optimization via Global Surrogates (\({\varDelta }\)-DOGS, see JOGO https://doi.org/10.1007/s10898-015-0384-2). The new algorithm, dubbed \(\alpha \)-DOGS, substantially reduces the overall cost of the optimization process for problems of this important class. Further, under certain well-defined conditions, rigorous proof of convergence to the global minimum of the problem considered is established.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alimo, S., Beyhaghi, P., Meneghello, G., Bewley, T.: Delaunay-based optimization in CFD leveraging multivariate adaptive polyharmonic splines (maps). In: 58th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, p. 0129 (2017)

  2. Alimo, S.R., Beyhaghi, P., Bewley, T.R.: Optimization combining derivative-free global exploration with derivative-based local refinement. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2531–2538. IEEE (2017)

  3. Amaioua, N., Audet, C., Conn, A.R., Le Digabel, S.: Efficient solution of quadratically constrained quadratic subproblems within the mesh adaptive direct search algorithm. Eur. J. Oper. Res. 268(1), 13–24 (2018)

    Article  MathSciNet  Google Scholar 

  4. Audet, C., Conn, A.R., Le Digabel, S., Peyrega, M.: A progressive barrier derivative-free trust-region algorithm for constrained optimization. Comput. Optim. Appl. 71(2), 307–329 (2018)

    Article  MathSciNet  Google Scholar 

  5. Audet, C., Hare, W.: Derivative-Free and Blackbox Optimization. Springer, Berlin (2017)

    Book  Google Scholar 

  6. Audet, C., Tribes, C.: Mesh-based Nelder–Mead algorithm for inequality constrained optimization. Comput. Optim. Appl. 71(2), 331–352 (2018)

    Article  MathSciNet  Google Scholar 

  7. Awad, H.P., Glynn, P.W.: On an initial transient deletion rule with rigorous theoretical support. In: Proceedings of the 38th Conference on Winter Simulation, pp. 186–191. Winter Simulation Conference (2006)

  8. Beran, J.: Statistics for Long-Memory Processes, vol. 61. CRC Press, Boca Raton (1994)

    MATH  Google Scholar 

  9. Beran, J.: Maximum likelihood estimation of the differencing parameter for invertible short and long memory autoregressive integrated moving average models. J. R. Stat. Soc. Ser. B (Methodol.) 57, 659–672 (1995)

    MathSciNet  MATH  Google Scholar 

  10. Bewley, T.R., Moin, P., Temam, R.: Dns-based predictive control of turbulence: an optimal benchmark for feedback algorithms. J. Fluid Mech. 447, 179–225 (2001)

    Article  MathSciNet  Google Scholar 

  11. Beyhaghi, P., Alimohammadi, S., Bewley, T.: Uncertainty quantification of the time averaging of a statistics computed from numerical simulation of turbulent flow (2018). arXiv preprint arXiv:1802.01056

  12. Beyhaghi, P., Bewley, T.: Implementation of Cartesian grids to accelerate Delaunay-based derivative-free optimization. J. Glob. Optim. 69(4), 927–949 (2017)

    Article  MathSciNet  Google Scholar 

  13. Beyhaghi, P., Bewley, T.R.: Delaunay-based derivative-free optimization via global surrogates, part II: convex constraints. J. Glob. Optim. 66, 383–415 (2016)

    Article  MathSciNet  Google Scholar 

  14. Beyhaghi, P., Cavaglieri, D., Bewley, T.: Delaunay-based derivative-free optimization via global surrogates, part I: linear constraints. J. Glob. Optim. 66, 1–52 (2015)

    MathSciNet  MATH  Google Scholar 

  15. Booker, A.J., Dennis Jr., J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.: A rigorous framework for optimization of expensive functions by surrogates. Struct. Optim. 17(1), 1–13 (1999)

    Article  Google Scholar 

  16. Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. 12, 1655–1695 (2011)

    MathSciNet  MATH  Google Scholar 

  17. Conn, A.R., Scheinberg, K., Vicente, L.N.: Global convergence of general derivative-free trust-region algorithms to first-and second-order critical points. SIAM J. Optim. 20(1), 387–415 (2009)

    Article  MathSciNet  Google Scholar 

  18. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization, vol. 8. SIAM, Philadelphia (2009)

    Book  Google Scholar 

  19. Deng G., Ferris, M.C.: Extension of the direct optimization algorithm for noisy functions. In: Proceedings of the 39th Conference on Winter Simulation: 40 Years! The Best is Yet to Come, pp. 497–504. IEEE Press (2007)

  20. Duchon, J.: Splines minimizing rotation-invariant semi-norms in Sobolev spaces. In: Constructive Theory of Functions of Several Variables, pp. 85–100. Springer (1977)

  21. Goluskin, D.: Bounding averages rigorously using semidefinite programming: mean moments of the Lorenz system. J. Nonlinear Sci. 28, 621–651 (2017)

    Article  MathSciNet  Google Scholar 

  22. Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)

    Article  MathSciNet  Google Scholar 

  23. Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 681–690. ACM (2008)

  24. Lasserre, J.B.: An Introduction to Polynomial and Semi-algebraic Optimization, vol. 52. Cambridge University Press, Cambridge (2015)

    Book  Google Scholar 

  25. Marsden, A.L., Wang, M., Dennis Jr., J.E., Moin, P.: Optimal aeroacoustic shape design using the surrogate management framework. Optim. Eng. 5(2), 235–262 (2004)

    Article  MathSciNet  Google Scholar 

  26. Marsden, A.L., Wang, M., Dennis Jr., J.E., Moin, P.: Suppression of vortex-shedding noise via derivative-free shape optimization. Phys. Fluids 16(10), 83–86 (2004)

    Article  Google Scholar 

  27. Nie, J., Yang, L., Zhong, S.: Stochastic polynomial optimization. In: Optimization Methods and Software, pp. 1–19 (2019)

  28. Norkin, V.I., Pflug, G.C., Ruszczyński, A.: A branch and bound method for stochastic global optimization. Math. Program. 83(1–3), 425–450 (1998)

    MathSciNet  MATH  Google Scholar 

  29. Oliver, T.A., Malaya, N., Ulerich, R., Moser, R.D.: Estimating uncertainties in statistics computed from direct numerical simulation. Phys. Fluids (1994-present) 26(3), 035101 (2014)

    Article  Google Scholar 

  30. Picheny, V., Ginsbourger, D., Richet, Y., Caplin, G.: Quantile-based optimization of noisy computer experiments with tunable precision. Technometrics 55(1), 2–13 (2013)

    Article  MathSciNet  Google Scholar 

  31. Quan, N., Yin, J., Ng, S.H., Lee, L.H.: Simulation optimization via kriging: a sequential search using expected improvement with computing budget constraints. IIE Trans. 45(7), 763–780 (2013)

    Article  Google Scholar 

  32. Rasmussen, C.E.: Gaussian Processes for Machine Learning. Citeseer (2006)

  33. Rullière, D., Faleh, A., Planchet, F., Youssef, W.: Exploring or reducing noise? Struct. Multidiscip. Optim. 47(6), 921–936 (2013)

    Article  MathSciNet  Google Scholar 

  34. Salesky, S.T., Chamecki, M., Dias, N.L.: Estimating the random error in eddy-covariance based fluxes and other turbulence statistics: the filtering method. Bound.-Layer Meteorol. 144(1), 113–135 (2012)

    Article  Google Scholar 

  35. Sankaran, S., Audet, C., Marsden, A.L.: A method for stochastic constrained optimization using derivative-free surrogate pattern search and collocation. J. Comput. Phys. 229(12), 4664–4682 (2010)

    Article  Google Scholar 

  36. Schonlau, M., Welch, W.J., Jones, D.R.: A data-analytic approach to Bayesian global optimization. In: Department of Statistics and Actuarial Science and The Institute for Improvement in Quality and Productivity, 1997 ASA Conference (1997)

  37. Sethi, S.P., Zhang, Q., Zhang, H.-Q.: Average-Cost Control of Stochastic Manufacturing Systems, vol. 54. Springer, Berlin (2005)

    MATH  Google Scholar 

  38. Slivkins, A.: Multi-armed bandits on implicit metric spaces. In: Advances in Neural Information Processing Systems, pp. 1602–1610 (2011)

  39. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

  40. Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory 58(5), 3250–3265 (2012)

    Article  MathSciNet  Google Scholar 

  41. Stewart, I.: Mathematics: The Lorenz attractor exists. Nature 406(6799), 948 (2000)

    Article  Google Scholar 

  42. Talgorn, B., Le Digabel, S., Kokkolaras, M.: Statistical surrogate formulations for simulation-based design optimization. J. Mech. Des. 137(2), 021405 (2015)

    Article  Google Scholar 

  43. Talnikar, C., Blonigan, P., Bodart, J., Wang, Q.: Parallel optimization for les. In: Proceedings of the Summer Program, p. 315 (2014)

  44. Theunissen, R., Di Sante, A., Riethmuller, M.L., Van den Braembussche, R.A.: Confidence estimation using dependent circular block bootstrapping: application to the statistical analysis of PIV measurements. Exp. Fluids 44(4), 591–596 (2008)

    Article  Google Scholar 

  45. Valko, M., Carpentier, A., Munos, R.: Stochastic simultaneous optimistic optimization. In: International Conference on Machine Learning, pp. 19–27 (2013)

  46. Wahba, G.: Spline Models for Observational Data, vol. 59. SIAM, Philadelphia (1990)

    Book  Google Scholar 

  47. Zhao, M., Alimo, S.R., Bewley, T.R.: An active subspace method for accelerating convergence in Delaunay-based optimization via dimension reduction. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2765–2770. IEEE (2018)

Download references

Acknowledgements

We would like to thank reviewers for their insightful comments on the paper, as these comments led us to an improvement of the work. Moreover, We gratefully acknowledge Prof. Phillip Gill and Prof. Alison Marsden for their collaborations and funding from AFOSR FA 9550-12-1-0046, Cymer Center for Control Systems & Dynamics, and Leidos corporation in support of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pooriya Beyhaghi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Polyharmonic spline regression

The algorithm described in this paper depends upon a smooth regression \(p^k(x)\) (see Assumption 1). The best technique for computing the regression is problem dependent. As with [12,13,14], a key advantage of our Delaunay-based approach in the present work is that it facilitates the use of any suitable regression technique, subject to it satisfying the “strict” regression property given in Definition 4. Since our numerical tests all implement the polyharmonic spline regression technique, the derivation of this regression technique is briefly explained in this appendix; additional details may be found in [46].

The polyharmonic spline regression p(x) of a function f(x) in \({\mathbb {R}}^n\) is defined as a weighted sum of a set of radial basis functions \(\varphi (r)\) built around the location of each measurement point, plus a linear function of x:

$$\begin{aligned}&p(x) = \sum _{i = 1}^N w_i\,\varphi (r) + v^T \begin{bmatrix} 1 \\ x \end{bmatrix}, \nonumber \\&\quad \text {where} \quad \varphi (r) = r^3 \quad \text {and} \quad r = {\Vert }x - x_i{\Vert }. \end{aligned}$$
(47)

The weights \(w_i\) and \(v_i\) represent N and \(n+1\) unknowns. Assume that \(\{y(x_1), y(x_2), \dots , y(x_n)\}\) is the set of measurements, with standard deviations \(\{\sigma _1, \sigma _2, \dots ,\sigma _2 \}\). The \(w_i\) and \(v_i\) coefficients are computed by minimizing the following objective function, which expresses is a tradeoff between the fit to the observed data and the smoothness of the regressor:

$$\begin{aligned} L_p(x)= \sum _{i=1}^N \left[ \frac{(p(x_i)-y(x_i))}{\sigma _i} \right] ^2 +\lambda \int _{B} {{|}\nabla ^m p(x){|}}, \end{aligned}$$
(48)

where B is a large box domain containing all of the \(x_i\), and \(\nabla ^m p(x)\) is the vector including all m derivatives of p(x) (see [20]). It is shown in [20] that the first-order optimality condition for the objective function (48) is as follows:

$$\begin{aligned} p(x_i)-y(x_i)+ \rho \,\sigma _i^2 w_i=0, \quad \forall 1 \le i \le N, \end{aligned}$$
(49)

where \(\rho \) is a parameter proportional to \(\lambda \). In summary, the coefficient of the regression can be derived by solving:

$$\begin{aligned}&\begin{bmatrix} F &{} V^T \\ V &{} 0 \end{bmatrix} \begin{bmatrix} w \\ v \end{bmatrix}= \begin{bmatrix} f(x_i) \\ 0 \end{bmatrix}, \nonumber \\&\quad F_{ij} = \varphi ({\Vert }x_i-x_j{\Vert }) +\rho \delta _{i,j} \,\sigma _i^2, \quad \quad V = \begin{bmatrix} 1 &{} 1 &{} \dots &{} 1 \\ x_1 &{} x_2 &{} \dots &{} x_N \end{bmatrix}, \end{aligned}$$
(50)

where \(\delta _{i,j}\) is the Kronecker delta.

The problem which is left to solve when computing the regression is to find an appropriate value of \(\rho \in [0, \infty )\). Solving (50) for any value of \(\rho \) gives a unique regression, denoted \(p(x,\rho )\). The parameter \(\rho \) is then obtained by a predictive mean-square error criteria developed in §4.4 in [46], which is given by imposing the following condition:

$$\begin{aligned} T(\rho )=\sum _{i=1}^N \left[ \frac{p(x_i,\rho )-y(x_i)}{\sigma _i}\right] ^2=1. \end{aligned}$$
(51)

For \(\rho \rightarrow \infty \), \(w_i\rightarrow 0\), and the solution of (50) is a weighted mean-square linear regression, which is obtained by solving (51). If \(T(\infty ) \le 1\), we take this linear regression as the best current regression for the available data. Otherwise, we have \(T(\infty )>1\) and (by construction) \(T(0)=0\); thus, (51) has a solution with finite \(\rho >0\), which gives the desired regression.

If \(T(\infty ) > 1\), we thus seek a \(\rho \) for which for \(T(\rho )=1\). Following [46], using (50), (51) simplifies to:

$$\begin{aligned} T(\rho )=\rho ^2 \left( \sum _{i=1}^N w_{i}) \,\sigma _i\right) ^2 =1, \end{aligned}$$
(52)

where \(w_{i, \rho }\) is the \(w_i\) which is obtained by solving (50). Define Dw and Dv as the vectors whose i-th elements are the derivatives of \(w_i\) and \(v_i\) with respect to \(\rho \), then

$$\begin{aligned}&T'(\rho )= \rho ^2 \sum _{i=1}^N w_i \, Dw_i \sigma _i^2+ 2\, \rho \left( \sum _{i=1}^N w_{i, \rho } \sigma _i\right) ^2, \\&\begin{bmatrix} F &{} V^T \\ V &{} 0 \end{bmatrix} \begin{bmatrix} Dw \\ Dv \end{bmatrix}+ \begin{bmatrix} \rho {\varSigma }_2 &{} 0 \\ 0 &{} 0 \end{bmatrix} \begin{bmatrix} w \\ v \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \end{aligned}$$

where \({\varSigma }_2\) is a diagonal matrix whose i-the diagonal element is \(\rho \,\sigma _i^2\). Therefore, the analytic expression for the derivative of \(T(\rho )\) is available. Thus, (51) can be solved quickly using Newton’s method.

The regression process presented here, imposing (1) as suggested by [46], is designed to obtain a regression which is reasonably smooth. However, there is no guarantee that this particular regression satisfies the strictness property required in the present work (see Definition 4). Note, however, that by imposing \(\rho =0\), the regression is made strict for arbitrary small \(\beta \). Thus, to satisfy strictness for a given finite \(\beta \), the value of \(\rho \) must sometimes be decreased from that which satisfies (1), as necessary.

Appendix 2: UQ for finite-time-averaging of the Lorenz system

This appendix summarizes briefly the simple empirical approach that is used in this paper to quantify the uncertainty of the cost function (44) when it is estimated using a finite time average.

In the method used, we simply simulated the Lorenz system (42) 30 independent times with various initial values for (XYZ). The cost function was then approximated using these different simulation lengths, and the standard deviation of the estimations obtained using various simulation lengths was calculated. The simple model given by \({A}/{\sqrt{T}}\) for the uncertainty was found to fit this empirical calculation quite well, as shown in Fig. 10.

Fig. 10
figure 10

Uncertainty quantification (UQ) for finite-time-average approximations of the inifinite-time-average statistic of interest in the cost function related to the Lorenz model. The uncertainty quantification model of \({A}/{\sqrt{T}}\) is found to give a very good empirical fit

Appendix 3: Application of \(\alpha \)-DOGS on higher dimensional problems

We now consider the challenge of increasing the dimension of the problem considered. This is rather difficult, as the algorithm developed herein (specifically, the computational cost of computing the Delaunay triangulation) scales rather poorly as the dimension of the problem is increased. In this appendix, the convergence behavior on a 5-dimensional problem is thus investigated.

The test problem that is considered here is the stochastically-obscured scaled Schwefel function (41) for \(n=5\). As in Sect. 6, performance of the \(\alpha \)-DOGS, \({\varDelta }\)-DOGS, and SMF algorithms are characterized. In this investigation, for all datapoint in \({\varDelta }\)-DOGs and SMF algorithm, a fixed averaging time of \(T=100\) was used. Figure 11 shows the convergence history of the regret function for each method. It is observed that the required total averaging time for the optimization problem with \(\alpha \)-DOGS is significantly better than the other methods considered.

To scale to even higher dimensional problems, the practical (though, perhaps, not provably globally convergent) idea of using only approximate Delaunay triangulations has been proposed, and will be considered in future work.

Fig. 11
figure 11

Convergence history of optimization algorithms for (41) for \(n=5\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beyhaghi, P., Alimo, R. & Bewley, T. A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging. Comput Optim Appl 76, 1–31 (2020). https://doi.org/10.1007/s10589-020-00172-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00172-4

Keywords

Navigation