Abstract
We investigate an inertial algorithm of gradient type in connection with the minimization of a non-convex differentiable function. The algorithm is formulated in the spirit of Nesterov’s accelerated convex gradient method. We prove some abstract convergence results which applied to our numerical scheme allow us to show that the generated sequences converge to a critical point of the objective function, provided a regularization of the objective function satisfies the Kurdyka–Łojasiewicz property. Further, we obtain convergence rates for the generated sequences and the objective function values formulated in terms of the Łojasiewicz exponent of a regularization of the objective function. Finally, some numerical experiments are presented in order to compare our numerical scheme and some algorithms well known in the literature.
Similar content being viewed by others
References
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9, 3–11 (2001)
Apidopoulos, V., Aujol, J.F., Dossal, Ch.: Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116(1–2), 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for non-convex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B 168(1–2), 123–175 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\), ESAIM: COCV, 25, Article number 2 (2019)
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(1), 1–34 (2000)
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261(10), 5734–5783 (2016)
Aujol, J.F., Dossal, Ch., Rondepierre, A.: Optimal convergence rates for Nesterov acceleration. SIAM J. Optim. 29(4), 3131–3153 (2019)
Aujol, J.F., Dossal, C.: Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for \(b > 0\). HAL preprint https://hal.inria.fr/hal-01547251v2/document
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bégout, P., Bolte, J., Jendoubi, M.A.: On damped second-order gradient systems. J. Differ. Equ. 259, 3115–3143 (2015)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for non-convex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Boţ, R.I., Csetnek, E.R.: Approaching nonsmooth non-convex optimization problems through first order dynamical systems with hidden acceleration and Hessian driven damping terms. Set Valued Var. Anal. 26, 227–245 (2018)
Boţ, R.I., Csetnek, E.R.: A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth non-convex function. ESAIM COCV 24(2), 463–477 (2018)
Boţ, R.I., Csetnek, E.R.: Newton-like dynamics associated to non-convex optimization problems. In: Hosseini, S., Mordukhovich, B., Uschmajew, A. (eds.) Nonsmooth Optimization and Its Applications, International Series of Numerical Mathematics, vol. 170, pp. 131–149. Birkhäuser, Cham (2019)
Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
Boţ, R.I., Csetnek, E.R., László, S.C.: Approaching nonsmooth non-convex minimization through second-order proximal-gradient dynamical systems. J. Evol. Equ. 18(3), 1291–1318 (2018)
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for minimizing the sum of two non-convex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)
Boţ, R.I., Csetnek, E.R., László, S.C.: A second order dynamical approach with variable damping to non-convex smooth minimization. Appl. Anal. 99(3), 361–378 (2020)
Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the non-convex setting: convergence analysis and rates. arXiv:1801.01994
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Chill, R.: On the Łojasiewicz–Simon gradient inequality. J. Funct. Anal. 201, 572–601 (2003)
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)
Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27(4), 2356–2380 (2017)
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996)
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka–Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)
Garrigos, G., Rosasco, L., Villa, S.: Convergence of the Forward-Backward algorithm: beyond the worst-case with the help of geometry, https://arxiv.org/pdf/1703.09477.pdf
Ghadimi, E., Feyzmahdavian, H.R., Johansson, M.: Global convergence of the heavy-ball method for convex optimization. In: 2015 European Control Conference (ECC). IEEE, pp. 310–315 (2015)
Haraux, A., Jendoubi, M.: Convergence of solutions of second-order gradient-like systems with analytic nonlinearities. J. Differ. Equ. 144(2), 313–320 (1998)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier (Grenoble) 48(3), 769–783 (1998)
Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26, 57–95 (2016)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1–34 (2018)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, pp. 87–89 (1963)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–325 (2015)
Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983). (Russian)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Dordrecht (2004)
Ochs, P.: Local convergence of the heavy-ball method and ipiano for non-convex optimization. J. Optim. Theory Appl. 177(1), 153–180 (2018)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springe, Berlin (1998)
Simon, L.: Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems. Ann. Math. 118(3), 525–571 (1983)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)
Sun, T., Yin, P., Li, D., Huang, C., Guan, L., Jiang, H.: Non-ergodic convergence analysis of heavy-ball algorithms. In: The Thirty-Third AAAI Conference on Artificial Intelligence (2019)
Zavriev, S.K., Kostyuk, F.V.: Heavy-ball method in non-convex optimization problems. Comput. Math. Model. 4, 336–341 (1993)
Acknowledgements
The author is thankful to two anonymous referees for their valuable remarks and suggestions which led to the improvement of the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P1-1.1-TE-2016-0266, and by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P4-ID-PCE-2016-0190, within PNCDI III.
Appendix
Appendix
1.1 Second order continuous dynamical systems that are modelling Algorithm (2)
In what follows we emphasize the connections between Algorithm (2) and the continuous dynamical systems (3) and (4).
Consider (4) with the initial conditions \(x(t_0)=u_0,\,{\dot{x}}(t_0)=v_0,\,u_0,v_0\in {\mathbb {R}}^m\) and the governing second order differential equation
We will use the time discretization presented in [6], that is, we take the fixed stepsize \(h> 0,\) and consider \(\beta =1-\gamma h>0\), \(t_n = \frac{1}{\beta } nh\) and \(x_n = x(t_n).\) Then the implicit/explicit discretization of (3) leads to
where \(y_n\) is a linear combination of \(x_n\) and \(x_{n-1}\) and will be defined below.
Now, (55) can be rewritten as
which suggest to choose \(y_n\) in the form
However, for practical purposes, it is convenient to work with the re-indexation \(n\rightarrowtail n+\alpha \) and we obtain the following equivalent formulation
Hence, by taking \(h^2=s\) we get
which is exactly Algorithm (2).
Remark 25
Obviously, already the form \(\beta =1-\gamma h>0\) shows that \(\beta \in (0,1).\) We could not obtain Algorithm (2) via some similar discretization of the continuous dynamical system (3) as the discretization method presented above. Nevertheless, we can show that (3) is the exact limit of Algorithm (2) in the sense of Su, Boyd and Candès [50].
In what follows we show that by choosing appropriate values of \(\beta \), both the continuous second order dynamical systems (3) and the continuous dynamical system (4) are the exact limit of the numerical scheme (2).
To this end we take in (2) small step sizes and follow the same approach as Su, Boyd and Candès in [50], (see also [27] for similar approaches). For this purpose we rewrite (2) in the form
and introduce the Ansatz \(x_n\approx x(n\sqrt{s})\) for some twice continuously differentiable function \(x : [0,+\infty ) \rightarrow {\mathbb {R}}^m\). We let \(n=\frac{t}{\sqrt{s}}\) and get \(x(t)\approx x_n,\,x(t+\sqrt{s})\approx x_{n+1},\,x(t-\sqrt{s})\approx x_{n-1}.\) Then, as the step size s goes to zero, from the Taylor expansion of x we obtain
and
Further, since
it follows \(\sqrt{s} {\nabla }g(y_n)=\sqrt{s}{\nabla }g(x_n)+ o(\sqrt{s})\). Consequently, (56) can be written as
or, equivalently
Hence,
Now, if we take \(\beta =1-\gamma {s}<1\) in (57) for some \(\frac{1}{{s}}>\gamma >0\), we obtain
After dividing by \(\sqrt{s}\) and letting \(s\rightarrow 0\), we obtain
which after division by t gives (3), that is,
Similarly, by taking \(\beta =1-\gamma \sqrt{s}<1\) in (57), for some \(\frac{1}{\sqrt{s}}>\gamma >0\), we obtain
After dividing by \(\sqrt{s}\) and letting \(s\rightarrow 0\), we get
which after division by t gives (4), that is,
Rights and permissions
About this article
Cite this article
László, S.C. Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math. Program. 190, 285–329 (2021). https://doi.org/10.1007/s10107-020-01534-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-020-01534-w
Keywords
- inertial algorithm
- Non-convex optimization
- Kurdyka–Łojasiewicz inequality
- Convergence rate
- Łojasiewicz exponent