Skip to main content
Log in

Regional complexity analysis of algorithms for nonconvex smooth optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

A strategy is proposed for characterizing the worst-case performance of algorithms for solving nonconvex smooth optimization problems. Contemporary analyses characterize worst-case performance by providing, under certain assumptions on an objective function, an upper bound on the number of iterations (or function or derivative evaluations) required until a pth-order stationarity condition is approximately satisfied. This arguably leads to conservative characterizations based on certain objectives rather than on ones that are typically encountered in practice. By contrast, the strategy proposed in this paper characterizes worst-case performance separately over regions comprising a search space. These regions are defined generically based on properties of derivative values. In this manner, one can analyze the worst-case performance of an algorithm independently from any particular class of objectives. Then, once given a class of objectives, one can obtain a tailored complexity analysis merely by delineating the types of regions that comprise the search spaces for functions in the class. Regions defined by first- and second-order derivatives are discussed in detail and example complexity analyses are provided for a few standard first- and second-order algorithms when employed to minimize convex and nonconvex objectives of interest. It is also explained how the strategy can be generalized to regions defined by higher-order derivatives and for analyzing the behavior of higher-order algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For the sake of brevity, we focus on worst-case complexity in terms of upper bounds on the number of iterations required until a termination condition is satisfied, although in general one should also take function and derivative evaluation complexity into account. These can be considered in the same manner as iteration complexity in our proposed strategy.

  2. Some authors take the term gradient-dominated to mean gradient-dominated of degree 2. We do not take this meaning since, as seen in [32] and in this paper, functions that are only gradient-dominated of degree 1 offer different and interesting results.

  3. In this case, the decrease in the objective would be indicative of an m-step linear (for part (a)) or m-step sublinear (for part (b)) rate of convergence. We do not explicitly refer to such a multi-step aspect of a convergence rate since it is always clear from the context.

  4. There arises an interesting scenario in this theorem for \(x_{k+1} \in \mathcal{R}_{1}^{1}\) during which \(\{f_k - f_{ref }\}\) might initially decrease at a superlinear rate. However, this should not be overstated. After all, if this scenario even occurs, then the number of iterations in which it will occur will be limited if the iterates remain at or near points in \(\mathcal{R}_{1}\).

References

  1. Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A., Toint, PhL: Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Math. Program. 163(1), 359–368 (2017)

    Article  MathSciNet  Google Scholar 

  2. Birgin, E.G., Martínez, J.M.: The use of quadratic regularization with a cubic descent condition for unconstrained optimization. SIAM J. Optim. 27(2), 1049–1074 (2017)

    Article  MathSciNet  Google Scholar 

  3. Borgwardt, K.-H.: The average number of pivot steps required by the Simplex-Method is polynomial. Zeitschrift für Operations Research 26(1), 157–177 (1982)

    MathSciNet  MATH  Google Scholar 

  4. Carmon, Y., Hinder, O., Duchi, J.C., Sidford, A.: “Convex until proven guilty”: dimension-free acceleration of gradient descent on non-convex functions. In: Proceedings of the International Conference on Machine Learning, PMLR Vol. 70, pp. 654–663 (2017)

  5. Cartis, C., Gould, N.I.M., Toint, PhL: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)

    Article  MathSciNet  Google Scholar 

  6. Cartis, C., Gould, N .I .M., Toint, Ph L: Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 127, 245–295 (2011)

    Article  MathSciNet  Google Scholar 

  7. Cartis, C., Gould, N .I .M., Toint, Ph L: Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function—and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)

    Article  MathSciNet  Google Scholar 

  8. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Optimal Newton-type methods for nonconvex smooth optimization problems. Technical Report ERGO Technical Report 11-009, School of Mathematics, University of Edinburgh (2011)

  9. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Evaluation complexity bounds for smooth constrained nonlinear optimisation using scaled KKT conditions, high-order models and the criticality measure \(\chi \). CoRR. arXiv:1705.04895 (2017)

  10. Cartis, C., Gould, N.I.M., Toint, PhL: Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using hölder continuous gradients. Optim. Methods Softw. 32(6), 1273–1298 (2017)

    Article  MathSciNet  Google Scholar 

  11. Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. 169(2), 337–375 (2018)

    Article  MathSciNet  Google Scholar 

  12. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM) (2000)

  13. Curtis, F.E., Lubberts, Z., Robinson, D.P.: Concise complexity analyses for trust region methods. Optim. Lett. 12(8), 1713–1724 (2018)

    Article  MathSciNet  Google Scholar 

  14. Curtis, F.E., Robinson, D.P.: Exploiting negative curvature in deterministic and stochastic optimization. Math. Program. Ser. B 176(1), 69–94 (2019)

    Article  MathSciNet  Google Scholar 

  15. Curtis, F.E., Robinson, D.P., Samadi, M.: A trust region algorithm with a worst-case iteration complexity of \({\cal{O}}(\epsilon ^{-3/2})\) for nonconvex optimization. Math. Program. 162(1), 1–32 (2017)

    Article  MathSciNet  Google Scholar 

  16. Curtis, F.E., Robinson, D.P., Samadi, M.: An inexact regularized Newton framework with a worst-case iteration complexity of \({\cal{O}}(\epsilon ^{-3/2})\) for nonconvex optimization. IMA J. Numer. Anal. (2018). https://doi.org/10.1093/imanum/dry022

    Article  Google Scholar 

  17. Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Proceedings of the International Conference On Neural Information Processing Systems, pp. 2933–2941 (2014)

  18. Dussault, J.-P.: ARCq: a new adaptive regularization by cubics. Optim. Methods Softw. 33(2), 322–335 (2018)

    Article  MathSciNet  Google Scholar 

  19. Dussault, J.-P., Orban, D.: Scalable adaptive cubic regularization methods. Technical Report G-2015-109, GERAD (2017)

  20. Fan, J., Yuan, Y.: A new trust region algorithm with trust region radius converging to zero. In: Proceedings of the International Conference on Optimization: Techniques and Applications, ICOTA, pp. 786–794 (2001)

  21. Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points— online stochastic gradient for tensor decomposition. In: Proceedings of the Conference on Learning Theory, CoLT, pp. 797–842 (2015)

  22. Gould, N.I.M., Porcelli, M., Toint, PhL: Updating the regularization parameter in the adaptive cubic regularization algorithm. Comput. Optim. Appl. 53(1), 1–22 (2012)

    Article  MathSciNet  Google Scholar 

  23. Grapiglia, G.N., Yuan, J., Yuan, Y.: On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization. Math. Program. 152(1–2), 491–520 (2015)

    Article  MathSciNet  Google Scholar 

  24. Grapiglia, G.N., Yuan, J., Yuan, Y.: Nonlinear stepsize control algorithms: complexity bounds for first-and second-order optimality. J. Optim. Theory Appl. 171(3), 980–997 (2016)

    Article  MathSciNet  Google Scholar 

  25. Gratton, S., Royer, C.W., Vicente, L.N.: A decoupled first/second-order steps technique for nonconvex nonlinear unconstrained optimization with improved complexity bounds. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1328-7

    Article  MATH  Google Scholar 

  26. Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: Proceedings of the International Conference on Machine Learning, ICML, pp. 1724–1732 (2017)

  27. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811 (2016)

  28. Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid strict saddle points. Math. Program. 176(1), 311–337 (2019)

    Article  MathSciNet  Google Scholar 

  29. Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Proceedings of the Conference on Learning Theory, CoLT, pp. 1246–1257 (2016)

  30. Liu, M., Li, Z., Wang, X., Yi, J., Yang, T.: Adaptive negative curvature descent with applications in non-convex optimization. In: Proceedings of the International Conference on Neural Information Processing Systems, NeurIPS, pp. 4854–4863 (2018)

  31. Nesterov, Yu.: Introductory Lectures on Convex Optimization. Springer, New York (2004)

    Book  Google Scholar 

  32. Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 117–205 (2006)

    Article  MathSciNet  Google Scholar 

  33. Nocedal, J., Wright, S.J.: Numerical Optimization, Second edn. Springer, New York (2006)

    MATH  Google Scholar 

  34. Paternain, S., Mokhtari, A., Ribeiro, A.: A Newton-based method for nonconvex optimization with fast evasion of saddle points. SIAM J. Optim. 29(1), 343–368 (2019)

    Article  MathSciNet  Google Scholar 

  35. Polyak, B.T.: Gradient methods for minimization of functionals. USSR Comput. Math. Math. Phys. 3(3), 643–653 (1963)

    MathSciNet  MATH  Google Scholar 

  36. Royer, C., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim. 28(2), 1448–1477 (2018)

    Article  MathSciNet  Google Scholar 

  37. Smale, S.: On the average number of steps of the simplex method of linear programming. Math. Program. 27(3), 241–262 (1983)

    Article  MathSciNet  Google Scholar 

  38. Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. Assoc. Comput. Mach. 51(3), 385–463 (2004)

    Article  MathSciNet  Google Scholar 

  39. Toint, PhL: Nonlinear stepsize control, trust regions and regularizations for unconstrained optimization. Optim. Methods Softw. 28(1), 82–95 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank E. Curtis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by the U.S. Department of Energy, Office of Science, Early Career Research Program under Award Number DE–SC0010615 (Advanced Scientific Computing Research), and by the U.S. National Science Foundation under Award Numbers CCF-1740796 and CCF-1618717 (Division of Computing and Communication Foundations) and IIS-1704458 (Division of Information and Intelligent Systems).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Curtis, F.E., Robinson, D.P. Regional complexity analysis of algorithms for nonconvex smooth optimization. Math. Program. 187, 579–615 (2021). https://doi.org/10.1007/s10107-020-01492-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-020-01492-3

Keywords

Mathematics Subject Classification

Navigation