Abstract
A strategy is proposed for characterizing the worst-case performance of algorithms for solving nonconvex smooth optimization problems. Contemporary analyses characterize worst-case performance by providing, under certain assumptions on an objective function, an upper bound on the number of iterations (or function or derivative evaluations) required until a pth-order stationarity condition is approximately satisfied. This arguably leads to conservative characterizations based on certain objectives rather than on ones that are typically encountered in practice. By contrast, the strategy proposed in this paper characterizes worst-case performance separately over regions comprising a search space. These regions are defined generically based on properties of derivative values. In this manner, one can analyze the worst-case performance of an algorithm independently from any particular class of objectives. Then, once given a class of objectives, one can obtain a tailored complexity analysis merely by delineating the types of regions that comprise the search spaces for functions in the class. Regions defined by first- and second-order derivatives are discussed in detail and example complexity analyses are provided for a few standard first- and second-order algorithms when employed to minimize convex and nonconvex objectives of interest. It is also explained how the strategy can be generalized to regions defined by higher-order derivatives and for analyzing the behavior of higher-order algorithms.
Similar content being viewed by others
Notes
For the sake of brevity, we focus on worst-case complexity in terms of upper bounds on the number of iterations required until a termination condition is satisfied, although in general one should also take function and derivative evaluation complexity into account. These can be considered in the same manner as iteration complexity in our proposed strategy.
Some authors take the term gradient-dominated to mean gradient-dominated of degree 2. We do not take this meaning since, as seen in [32] and in this paper, functions that are only gradient-dominated of degree 1 offer different and interesting results.
In this case, the decrease in the objective would be indicative of an m-step linear (for part (a)) or m-step sublinear (for part (b)) rate of convergence. We do not explicitly refer to such a multi-step aspect of a convergence rate since it is always clear from the context.
There arises an interesting scenario in this theorem for \(x_{k+1} \in \mathcal{R}_{1}^{1}\) during which \(\{f_k - f_{ref }\}\) might initially decrease at a superlinear rate. However, this should not be overstated. After all, if this scenario even occurs, then the number of iterations in which it will occur will be limited if the iterates remain at or near points in \(\mathcal{R}_{1}\).
References
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A., Toint, PhL: Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Math. Program. 163(1), 359–368 (2017)
Birgin, E.G., Martínez, J.M.: The use of quadratic regularization with a cubic descent condition for unconstrained optimization. SIAM J. Optim. 27(2), 1049–1074 (2017)
Borgwardt, K.-H.: The average number of pivot steps required by the Simplex-Method is polynomial. Zeitschrift für Operations Research 26(1), 157–177 (1982)
Carmon, Y., Hinder, O., Duchi, J.C., Sidford, A.: “Convex until proven guilty”: dimension-free acceleration of gradient descent on non-convex functions. In: Proceedings of the International Conference on Machine Learning, PMLR Vol. 70, pp. 654–663 (2017)
Cartis, C., Gould, N.I.M., Toint, PhL: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)
Cartis, C., Gould, N .I .M., Toint, Ph L: Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 127, 245–295 (2011)
Cartis, C., Gould, N .I .M., Toint, Ph L: Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function—and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Optimal Newton-type methods for nonconvex smooth optimization problems. Technical Report ERGO Technical Report 11-009, School of Mathematics, University of Edinburgh (2011)
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Evaluation complexity bounds for smooth constrained nonlinear optimisation using scaled KKT conditions, high-order models and the criticality measure \(\chi \). CoRR. arXiv:1705.04895 (2017)
Cartis, C., Gould, N.I.M., Toint, PhL: Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using hölder continuous gradients. Optim. Methods Softw. 32(6), 1273–1298 (2017)
Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. 169(2), 337–375 (2018)
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM) (2000)
Curtis, F.E., Lubberts, Z., Robinson, D.P.: Concise complexity analyses for trust region methods. Optim. Lett. 12(8), 1713–1724 (2018)
Curtis, F.E., Robinson, D.P.: Exploiting negative curvature in deterministic and stochastic optimization. Math. Program. Ser. B 176(1), 69–94 (2019)
Curtis, F.E., Robinson, D.P., Samadi, M.: A trust region algorithm with a worst-case iteration complexity of \({\cal{O}}(\epsilon ^{-3/2})\) for nonconvex optimization. Math. Program. 162(1), 1–32 (2017)
Curtis, F.E., Robinson, D.P., Samadi, M.: An inexact regularized Newton framework with a worst-case iteration complexity of \({\cal{O}}(\epsilon ^{-3/2})\) for nonconvex optimization. IMA J. Numer. Anal. (2018). https://doi.org/10.1093/imanum/dry022
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Proceedings of the International Conference On Neural Information Processing Systems, pp. 2933–2941 (2014)
Dussault, J.-P.: ARCq: a new adaptive regularization by cubics. Optim. Methods Softw. 33(2), 322–335 (2018)
Dussault, J.-P., Orban, D.: Scalable adaptive cubic regularization methods. Technical Report G-2015-109, GERAD (2017)
Fan, J., Yuan, Y.: A new trust region algorithm with trust region radius converging to zero. In: Proceedings of the International Conference on Optimization: Techniques and Applications, ICOTA, pp. 786–794 (2001)
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points— online stochastic gradient for tensor decomposition. In: Proceedings of the Conference on Learning Theory, CoLT, pp. 797–842 (2015)
Gould, N.I.M., Porcelli, M., Toint, PhL: Updating the regularization parameter in the adaptive cubic regularization algorithm. Comput. Optim. Appl. 53(1), 1–22 (2012)
Grapiglia, G.N., Yuan, J., Yuan, Y.: On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization. Math. Program. 152(1–2), 491–520 (2015)
Grapiglia, G.N., Yuan, J., Yuan, Y.: Nonlinear stepsize control algorithms: complexity bounds for first-and second-order optimality. J. Optim. Theory Appl. 171(3), 980–997 (2016)
Gratton, S., Royer, C.W., Vicente, L.N.: A decoupled first/second-order steps technique for nonconvex nonlinear unconstrained optimization with improved complexity bounds. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1328-7
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: Proceedings of the International Conference on Machine Learning, ICML, pp. 1724–1732 (2017)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811 (2016)
Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid strict saddle points. Math. Program. 176(1), 311–337 (2019)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Proceedings of the Conference on Learning Theory, CoLT, pp. 1246–1257 (2016)
Liu, M., Li, Z., Wang, X., Yi, J., Yang, T.: Adaptive negative curvature descent with applications in non-convex optimization. In: Proceedings of the International Conference on Neural Information Processing Systems, NeurIPS, pp. 4854–4863 (2018)
Nesterov, Yu.: Introductory Lectures on Convex Optimization. Springer, New York (2004)
Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 117–205 (2006)
Nocedal, J., Wright, S.J.: Numerical Optimization, Second edn. Springer, New York (2006)
Paternain, S., Mokhtari, A., Ribeiro, A.: A Newton-based method for nonconvex optimization with fast evasion of saddle points. SIAM J. Optim. 29(1), 343–368 (2019)
Polyak, B.T.: Gradient methods for minimization of functionals. USSR Comput. Math. Math. Phys. 3(3), 643–653 (1963)
Royer, C., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim. 28(2), 1448–1477 (2018)
Smale, S.: On the average number of steps of the simplex method of linear programming. Math. Program. 27(3), 241–262 (1983)
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. Assoc. Comput. Mach. 51(3), 385–463 (2004)
Toint, PhL: Nonlinear stepsize control, trust regions and regularizations for unconstrained optimization. Optim. Methods Softw. 28(1), 82–95 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported by the U.S. Department of Energy, Office of Science, Early Career Research Program under Award Number DE–SC0010615 (Advanced Scientific Computing Research), and by the U.S. National Science Foundation under Award Numbers CCF-1740796 and CCF-1618717 (Division of Computing and Communication Foundations) and IIS-1704458 (Division of Information and Intelligent Systems).
Rights and permissions
About this article
Cite this article
Curtis, F.E., Robinson, D.P. Regional complexity analysis of algorithms for nonconvex smooth optimization. Math. Program. 187, 579–615 (2021). https://doi.org/10.1007/s10107-020-01492-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-020-01492-3
Keywords
- Nonlinear optimization
- Nonconvex optimization
- Worst-case iteration complexity
- Worst-case evaluation complexity
- Regional complexity analysis