A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging

Beyhaghi, Pooriya; Alimo, Ryan; Bewley, Thomas

doi:10.1007/s10589-020-00172-4

A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging

Published: 04 February 2020

Volume 76, pages 1–31, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Pooriya Beyhaghi¹,
Ryan Alimo¹ &
Thomas Bewley¹

418 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

This paper considers the efficient minimization of the infinite time average of a stationary ergodic process in the space of a handful of design parameters which affect it. Problems of this class, derived from physical or numerical experiments which are sometimes expensive to perform, are ubiquitous in engineering applications. In such problems, any given function evaluation, determined with finite sampling, is associated with a quantifiable amount of uncertainty, which may be reduced via additional sampling. The present paper proposes a new optimization algorithm to adjust the amount of sampling associated with each function evaluation, making function evaluations more accurate (and, thus, more expensive), as required, as convergence is approached. The work builds on our algorithm for Delaunay-based Derivative-free Optimization via Global Surrogates (${\varDelta }$-DOGS, see JOGO https://doi.org/10.1007/s10898-015-0384-2). The new algorithm, dubbed $\alpha $-DOGS, substantially reduces the overall cost of the optimization process for problems of this important class. Further, under certain well-defined conditions, rigorous proof of convergence to the global minimum of the problem considered is established.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Multi-objective Optimization: Methods and Algorithms in Mechanical Engineering Problems

Article 21 October 2021

João Luiz Junho Pereira, Guilherme Antônio Oliver, … Guilherme Ferreira Gomes

Efficiency of higher-order algorithms for minimizing composite functions

Article 10 October 2023

Yassine Nabou & Ion Necoara

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Yurii Nesterov & Vladimir Spokoiny

References

Alimo, S., Beyhaghi, P., Meneghello, G., Bewley, T.: Delaunay-based optimization in CFD leveraging multivariate adaptive polyharmonic splines (maps). In: 58th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, p. 0129 (2017)
Alimo, S.R., Beyhaghi, P., Bewley, T.R.: Optimization combining derivative-free global exploration with derivative-based local refinement. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2531–2538. IEEE (2017)
Amaioua, N., Audet, C., Conn, A.R., Le Digabel, S.: Efficient solution of quadratically constrained quadratic subproblems within the mesh adaptive direct search algorithm. Eur. J. Oper. Res. 268(1), 13–24 (2018)
Article MathSciNet Google Scholar
Audet, C., Conn, A.R., Le Digabel, S., Peyrega, M.: A progressive barrier derivative-free trust-region algorithm for constrained optimization. Comput. Optim. Appl. 71(2), 307–329 (2018)
Article MathSciNet Google Scholar
Audet, C., Hare, W.: Derivative-Free and Blackbox Optimization. Springer, Berlin (2017)
Book Google Scholar
Audet, C., Tribes, C.: Mesh-based Nelder–Mead algorithm for inequality constrained optimization. Comput. Optim. Appl. 71(2), 331–352 (2018)
Article MathSciNet Google Scholar
Awad, H.P., Glynn, P.W.: On an initial transient deletion rule with rigorous theoretical support. In: Proceedings of the 38th Conference on Winter Simulation, pp. 186–191. Winter Simulation Conference (2006)
Beran, J.: Statistics for Long-Memory Processes, vol. 61. CRC Press, Boca Raton (1994)
MATH Google Scholar
Beran, J.: Maximum likelihood estimation of the differencing parameter for invertible short and long memory autoregressive integrated moving average models. J. R. Stat. Soc. Ser. B (Methodol.) 57, 659–672 (1995)
MathSciNet MATH Google Scholar
Bewley, T.R., Moin, P., Temam, R.: Dns-based predictive control of turbulence: an optimal benchmark for feedback algorithms. J. Fluid Mech. 447, 179–225 (2001)
Article MathSciNet Google Scholar
Beyhaghi, P., Alimohammadi, S., Bewley, T.: Uncertainty quantification of the time averaging of a statistics computed from numerical simulation of turbulent flow (2018). arXiv preprint arXiv:1802.01056
Beyhaghi, P., Bewley, T.: Implementation of Cartesian grids to accelerate Delaunay-based derivative-free optimization. J. Glob. Optim. 69(4), 927–949 (2017)
Article MathSciNet Google Scholar
Beyhaghi, P., Bewley, T.R.: Delaunay-based derivative-free optimization via global surrogates, part II: convex constraints. J. Glob. Optim. 66, 383–415 (2016)
Article MathSciNet Google Scholar
Beyhaghi, P., Cavaglieri, D., Bewley, T.: Delaunay-based derivative-free optimization via global surrogates, part I: linear constraints. J. Glob. Optim. 66, 1–52 (2015)
MathSciNet MATH Google Scholar
Booker, A.J., Dennis Jr., J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.: A rigorous framework for optimization of expensive functions by surrogates. Struct. Optim. 17(1), 1–13 (1999)
Article Google Scholar
Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. 12, 1655–1695 (2011)
MathSciNet MATH Google Scholar
Conn, A.R., Scheinberg, K., Vicente, L.N.: Global convergence of general derivative-free trust-region algorithms to first-and second-order critical points. SIAM J. Optim. 20(1), 387–415 (2009)
Article MathSciNet Google Scholar
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization, vol. 8. SIAM, Philadelphia (2009)
Book Google Scholar
Deng G., Ferris, M.C.: Extension of the direct optimization algorithm for noisy functions. In: Proceedings of the 39th Conference on Winter Simulation: 40 Years! The Best is Yet to Come, pp. 497–504. IEEE Press (2007)
Duchon, J.: Splines minimizing rotation-invariant semi-norms in Sobolev spaces. In: Constructive Theory of Functions of Several Variables, pp. 85–100. Springer (1977)
Goluskin, D.: Bounding averages rigorously using semidefinite programming: mean moments of the Lorenz system. J. Nonlinear Sci. 28, 621–651 (2017)
Article MathSciNet Google Scholar
Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)
Article MathSciNet Google Scholar
Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 681–690. ACM (2008)
Lasserre, J.B.: An Introduction to Polynomial and Semi-algebraic Optimization, vol. 52. Cambridge University Press, Cambridge (2015)
Book Google Scholar
Marsden, A.L., Wang, M., Dennis Jr., J.E., Moin, P.: Optimal aeroacoustic shape design using the surrogate management framework. Optim. Eng. 5(2), 235–262 (2004)
Article MathSciNet Google Scholar
Marsden, A.L., Wang, M., Dennis Jr., J.E., Moin, P.: Suppression of vortex-shedding noise via derivative-free shape optimization. Phys. Fluids 16(10), 83–86 (2004)
Article Google Scholar
Nie, J., Yang, L., Zhong, S.: Stochastic polynomial optimization. In: Optimization Methods and Software, pp. 1–19 (2019)
Norkin, V.I., Pflug, G.C., Ruszczyński, A.: A branch and bound method for stochastic global optimization. Math. Program. 83(1–3), 425–450 (1998)
MathSciNet MATH Google Scholar
Oliver, T.A., Malaya, N., Ulerich, R., Moser, R.D.: Estimating uncertainties in statistics computed from direct numerical simulation. Phys. Fluids (1994-present) 26(3), 035101 (2014)
Article Google Scholar
Picheny, V., Ginsbourger, D., Richet, Y., Caplin, G.: Quantile-based optimization of noisy computer experiments with tunable precision. Technometrics 55(1), 2–13 (2013)
Article MathSciNet Google Scholar
Quan, N., Yin, J., Ng, S.H., Lee, L.H.: Simulation optimization via kriging: a sequential search using expected improvement with computing budget constraints. IIE Trans. 45(7), 763–780 (2013)
Article Google Scholar
Rasmussen, C.E.: Gaussian Processes for Machine Learning. Citeseer (2006)
Rullière, D., Faleh, A., Planchet, F., Youssef, W.: Exploring or reducing noise? Struct. Multidiscip. Optim. 47(6), 921–936 (2013)
Article MathSciNet Google Scholar
Salesky, S.T., Chamecki, M., Dias, N.L.: Estimating the random error in eddy-covariance based fluxes and other turbulence statistics: the filtering method. Bound.-Layer Meteorol. 144(1), 113–135 (2012)
Article Google Scholar
Sankaran, S., Audet, C., Marsden, A.L.: A method for stochastic constrained optimization using derivative-free surrogate pattern search and collocation. J. Comput. Phys. 229(12), 4664–4682 (2010)
Article Google Scholar
Schonlau, M., Welch, W.J., Jones, D.R.: A data-analytic approach to Bayesian global optimization. In: Department of Statistics and Actuarial Science and The Institute for Improvement in Quality and Productivity, 1997 ASA Conference (1997)
Sethi, S.P., Zhang, Q., Zhang, H.-Q.: Average-Cost Control of Stochastic Manufacturing Systems, vol. 54. Springer, Berlin (2005)
MATH Google Scholar
Slivkins, A.: Multi-armed bandits on implicit metric spaces. In: Advances in Neural Information Processing Systems, pp. 1602–1610 (2011)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory 58(5), 3250–3265 (2012)
Article MathSciNet Google Scholar
Stewart, I.: Mathematics: The Lorenz attractor exists. Nature 406(6799), 948 (2000)
Article Google Scholar
Talgorn, B., Le Digabel, S., Kokkolaras, M.: Statistical surrogate formulations for simulation-based design optimization. J. Mech. Des. 137(2), 021405 (2015)
Article Google Scholar
Talnikar, C., Blonigan, P., Bodart, J., Wang, Q.: Parallel optimization for les. In: Proceedings of the Summer Program, p. 315 (2014)
Theunissen, R., Di Sante, A., Riethmuller, M.L., Van den Braembussche, R.A.: Confidence estimation using dependent circular block bootstrapping: application to the statistical analysis of PIV measurements. Exp. Fluids 44(4), 591–596 (2008)
Article Google Scholar
Valko, M., Carpentier, A., Munos, R.: Stochastic simultaneous optimistic optimization. In: International Conference on Machine Learning, pp. 19–27 (2013)
Wahba, G.: Spline Models for Observational Data, vol. 59. SIAM, Philadelphia (1990)
Book Google Scholar
Zhao, M., Alimo, S.R., Bewley, T.R.: An active subspace method for accelerating convergence in Delaunay-based optimization via dimension reduction. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 2765–2770. IEEE (2018)

Download references

Acknowledgements

We would like to thank reviewers for their insightful comments on the paper, as these comments led us to an improvement of the work. Moreover, We gratefully acknowledge Prof. Phillip Gill and Prof. Alison Marsden for their collaborations and funding from AFOSR FA 9550-12-1-0046, Cymer Center for Control Systems & Dynamics, and Leidos corporation in support of this work.

Author information

Authors and Affiliations

Flow Control Lab, University of California, San Diego, San Diego, USA
Pooriya Beyhaghi, Ryan Alimo & Thomas Bewley

Authors

Pooriya Beyhaghi
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Alimo
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bewley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pooriya Beyhaghi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Polyharmonic spline regression

The algorithm described in this paper depends upon a smooth regression $p^k(x)$ (see Assumption 1). The best technique for computing the regression is problem dependent. As with [12,13,14], a key advantage of our Delaunay-based approach in the present work is that it facilitates the use of any suitable regression technique, subject to it satisfying the “strict” regression property given in Definition 4. Since our numerical tests all implement the polyharmonic spline regression technique, the derivation of this regression technique is briefly explained in this appendix; additional details may be found in [46].

The polyharmonic spline regression p(x) of a function f(x) in ${\mathbb {R}}^n$ is defined as a weighted sum of a set of radial basis functions $\varphi (r)$ built around the location of each measurement point, plus a linear function of x:

$$\begin{aligned}&p(x) = \sum _{i = 1}^N w_i\,\varphi (r) + v^T \begin{bmatrix} 1 \\ x \end{bmatrix}, \nonumber \\&\quad \text {where} \quad \varphi (r) = r^3 \quad \text {and} \quad r = {\Vert }x - x_i{\Vert }. \end{aligned}$$

(47)

The weights $w_i$ and $v_i$ represent N and $n+1$ unknowns. Assume that $\{y(x_1), y(x_2), \dots , y(x_n)\}$ is the set of measurements, with standard deviations $\{\sigma _1, \sigma _2, \dots ,\sigma _2 \}$. The $w_i$ and $v_i$ coefficients are computed by minimizing the following objective function, which expresses is a tradeoff between the fit to the observed data and the smoothness of the regressor:

$$\begin{aligned} L_p(x)= \sum _{i=1}^N \left[ \frac{(p(x_i)-y(x_i))}{\sigma _i} \right] ^2 +\lambda \int _{B} {{|}\nabla ^m p(x){|}}, \end{aligned}$$

(48)

where B is a large box domain containing all of the $x_i$, and $\nabla ^m p(x)$ is the vector including all m derivatives of p(x) (see [20]). It is shown in [20] that the first-order optimality condition for the objective function (48) is as follows:

$$\begin{aligned} p(x_i)-y(x_i)+ \rho \,\sigma _i^2 w_i=0, \quad \forall 1 \le i \le N, \end{aligned}$$

(49)

where $\rho $ is a parameter proportional to $\lambda $. In summary, the coefficient of the regression can be derived by solving:

$$\begin{aligned}&\begin{bmatrix} F &{} V^T \\ V &{} 0 \end{bmatrix} \begin{bmatrix} w \\ v \end{bmatrix}= \begin{bmatrix} f(x_i) \\ 0 \end{bmatrix}, \nonumber \\&\quad F_{ij} = \varphi ({\Vert }x_i-x_j{\Vert }) +\rho \delta _{i,j} \,\sigma _i^2, \quad \quad V = \begin{bmatrix} 1 &{} 1 &{} \dots &{} 1 \\ x_1 &{} x_2 &{} \dots &{} x_N \end{bmatrix}, \end{aligned}$$

(50)

where $\delta _{i,j}$ is the Kronecker delta.

The problem which is left to solve when computing the regression is to find an appropriate value of $\rho \in [0, \infty )$. Solving (50) for any value of $\rho $ gives a unique regression, denoted $p(x,\rho )$. The parameter $\rho $ is then obtained by a predictive mean-square error criteria developed in §4.4 in [46], which is given by imposing the following condition:

$$\begin{aligned} T(\rho )=\sum _{i=1}^N \left[ \frac{p(x_i,\rho )-y(x_i)}{\sigma _i}\right] ^2=1. \end{aligned}$$

(51)

For $\rho \rightarrow \infty $, $w_i\rightarrow 0$, and the solution of (50) is a weighted mean-square linear regression, which is obtained by solving (51). If $T(\infty ) \le 1$, we take this linear regression as the best current regression for the available data. Otherwise, we have $T(\infty )>1$ and (by construction) $T(0)=0$; thus, (51) has a solution with finite $\rho >0$, which gives the desired regression.

If $T(\infty ) > 1$, we thus seek a $\rho $ for which for $T(\rho )=1$. Following [46], using (50), (51) simplifies to:

$$\begin{aligned} T(\rho )=\rho ^2 \left( \sum _{i=1}^N w_{i}) \,\sigma _i\right) ^2 =1, \end{aligned}$$

(52)

where $w_{i, \rho }$ is the $w_i$ which is obtained by solving (50). Define Dw and Dv as the vectors whose i-th elements are the derivatives of $w_i$ and $v_i$ with respect to $\rho $, then

$$\begin{aligned}&T'(\rho )= \rho ^2 \sum _{i=1}^N w_i \, Dw_i \sigma _i^2+ 2\, \rho \left( \sum _{i=1}^N w_{i, \rho } \sigma _i\right) ^2, \\&\begin{bmatrix} F &{} V^T \\ V &{} 0 \end{bmatrix} \begin{bmatrix} Dw \\ Dv \end{bmatrix}+ \begin{bmatrix} \rho {\varSigma }_2 &{} 0 \\ 0 &{} 0 \end{bmatrix} \begin{bmatrix} w \\ v \end{bmatrix}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \end{aligned}$$

where ${\varSigma }_2$ is a diagonal matrix whose i-the diagonal element is $\rho \,\sigma _i^2$. Therefore, the analytic expression for the derivative of $T(\rho )$ is available. Thus, (51) can be solved quickly using Newton’s method.

The regression process presented here, imposing (1) as suggested by [46], is designed to obtain a regression which is reasonably smooth. However, there is no guarantee that this particular regression satisfies the strictness property required in the present work (see Definition 4). Note, however, that by imposing $\rho =0$, the regression is made strict for arbitrary small $\beta $. Thus, to satisfy strictness for a given finite $\beta $, the value of $\rho $ must sometimes be decreased from that which satisfies (1), as necessary.

Appendix 2: UQ for finite-time-averaging of the Lorenz system

This appendix summarizes briefly the simple empirical approach that is used in this paper to quantify the uncertainty of the cost function (44) when it is estimated using a finite time average.

In the method used, we simply simulated the Lorenz system (42) 30 independent times with various initial values for (X, Y, Z). The cost function was then approximated using these different simulation lengths, and the standard deviation of the estimations obtained using various simulation lengths was calculated. The simple model given by ${A}/{\sqrt{T}}$ for the uncertainty was found to fit this empirical calculation quite well, as shown in Fig. 10.

Appendix 3: Application of $\alpha $-DOGS on higher dimensional problems

We now consider the challenge of increasing the dimension of the problem considered. This is rather difficult, as the algorithm developed herein (specifically, the computational cost of computing the Delaunay triangulation) scales rather poorly as the dimension of the problem is increased. In this appendix, the convergence behavior on a 5-dimensional problem is thus investigated.

The test problem that is considered here is the stochastically-obscured scaled Schwefel function (41) for $n=5$. As in Sect. 6, performance of the $\alpha $-DOGS, ${\varDelta }$-DOGS, and SMF algorithms are characterized. In this investigation, for all datapoint in ${\varDelta }$-DOGs and SMF algorithm, a fixed averaging time of $T=100$ was used. Figure 11 shows the convergence history of the regret function for each method. It is observed that the required total averaging time for the optimization problem with $\alpha $-DOGS is significantly better than the other methods considered.

To scale to even higher dimensional problems, the practical (though, perhaps, not provably globally convergent) idea of using only approximate Delaunay triangulations has been proposed, and will be considered in future work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beyhaghi, P., Alimo, R. & Bewley, T. A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging. Comput Optim Appl 76, 1–31 (2020). https://doi.org/10.1007/s10589-020-00172-4

Download citation

Received: 29 September 2016
Published: 04 February 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10589-020-00172-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging

Abstract

Access this article

Similar content being viewed by others

A Review of Multi-objective Optimization: Methods and Algorithms in Mechanical Engineering Problems

Efficiency of higher-order algorithms for minimizing composite functions

Random Gradient-Free Minimization of Convex Functions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Polyharmonic spline regression

Appendix 2: UQ for finite-time-averaging of the Lorenz system

Appendix 3: Application of \(\alpha \)-DOGS on higher dimensional problems

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A derivative-free optimization algorithm for the efficient minimization of functions obtained via statistical averaging

Abstract

Access this article

Similar content being viewed by others

A Review of Multi-objective Optimization: Methods and Algorithms in Mechanical Engineering Problems

Efficiency of higher-order algorithms for minimizing composite functions

Random Gradient-Free Minimization of Convex Functions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Polyharmonic spline regression

Appendix 2: UQ for finite-time-averaging of the Lorenz system

Appendix 3: Application of \(\alpha \)-DOGS on higher dimensional problems

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation