Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization

László, Szilárd Csaba

doi:10.1007/s10107-020-01534-w

Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization

Full Length Paper
Series A
Published: 01 July 2020

Volume 190, pages 285–329, (2021)
Cite this article

Mathematical Programming Submit manuscript

Szilárd Csaba László ORCID: orcid.org/0000-0001-5140-1144¹

1115 Accesses
12 Citations
Explore all metrics

Abstract

We investigate an inertial algorithm of gradient type in connection with the minimization of a non-convex differentiable function. The algorithm is formulated in the spirit of Nesterov’s accelerated convex gradient method. We prove some abstract convergence results which applied to our numerical scheme allow us to show that the generated sequences converge to a critical point of the objective function, provided a regularization of the objective function satisfies the Kurdyka–Łojasiewicz property. Further, we obtain convergence rates for the generated sequences and the objective function values formulated in terms of the Łojasiewicz exponent of a regularization of the objective function. Finally, some numerical experiments are presented in order to compare our numerical scheme and some algorithms well known in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 4

Fig. 5

Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions

Article 19 February 2020

Vassilis Apidopoulos, Jean-François Aujol, … Aude Rondepierre

A gradient-type algorithm with backward inertial steps associated to a nonconvex minimization problem

Article 13 July 2019

Cristian Daniel Alecsa, Szilárd Csaba László & Adrian Viorel

First-order optimization algorithms via inertial systems with Hessian driven damping

Article 16 November 2020

Hedy Attouch, Zaki Chbani, … Hassan Riahi

References

Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9, 3–11 (2001)
MathSciNet MATH Google Scholar
Apidopoulos, V., Aujol, J.F., Dossal, Ch.: Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116(1–2), 5–16 (2009)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for non-convex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B 168(1–2), 123–175 (2018)
MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case $\alpha \le 3$, ESAIM: COCV, 25, Article number 2 (2019)
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(1), 1–34 (2000)
MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261(10), 5734–5783 (2016)
MathSciNet MATH Google Scholar
Aujol, J.F., Dossal, Ch., Rondepierre, A.: Optimal convergence rates for Nesterov acceleration. SIAM J. Optim. 29(4), 3131–3153 (2019)
MathSciNet MATH Google Scholar
Aujol, J.F., Dossal, C.: Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for $b > 0$. HAL preprint https://hal.inria.fr/hal-01547251v2/document
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
MathSciNet MATH Google Scholar
Bégout, P., Bolte, J., Jendoubi, M.A.: On damped second-order gradient systems. J. Differ. Equ. 259, 3115–3143 (2015)
MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for non-convex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
MathSciNet MATH Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)
MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: Approaching nonsmooth non-convex optimization problems through first order dynamical systems with hidden acceleration and Hessian driven damping terms. Set Valued Var. Anal. 26, 227–245 (2018)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth non-convex function. ESAIM COCV 24(2), 463–477 (2018)
MATH Google Scholar
Boţ, R.I., Csetnek, E.R.: Newton-like dynamics associated to non-convex optimization problems. In: Hosseini, S., Mordukhovich, B., Uschmajew, A. (eds.) Nonsmooth Optimization and Its Applications, International Series of Numerical Mathematics, vol. 170, pp. 131–149. Birkhäuser, Cham (2019)
MATH Google Scholar
Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: Approaching nonsmooth non-convex minimization through second-order proximal-gradient dynamical systems. J. Evol. Equ. 18(3), 1291–1318 (2018)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for minimizing the sum of two non-convex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: A second order dynamical approach with variable damping to non-convex smooth minimization. Appl. Anal. 99(3), 361–378 (2020)
MathSciNet MATH Google Scholar
Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the non-convex setting: convergence analysis and rates. arXiv:1801.01994
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
MathSciNet MATH Google Scholar
Chill, R.: On the Łojasiewicz–Simon gradient inequality. J. Funct. Anal. 201, 572–601 (2003)
MathSciNet MATH Google Scholar
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)
MathSciNet MATH Google Scholar
Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27(4), 2356–2380 (2017)
MathSciNet MATH Google Scholar
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996)
MathSciNet MATH Google Scholar
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka–Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)
MathSciNet MATH Google Scholar
Garrigos, G., Rosasco, L., Villa, S.: Convergence of the Forward-Backward algorithm: beyond the worst-case with the help of geometry, https://arxiv.org/pdf/1703.09477.pdf
Ghadimi, E., Feyzmahdavian, H.R., Johansson, M.: Global convergence of the heavy-ball method for convex optimization. In: 2015 European Control Conference (ECC). IEEE, pp. 310–315 (2015)
Haraux, A., Jendoubi, M.: Convergence of solutions of second-order gradient-like systems with analytic nonlinearities. J. Differ. Equ. 144(2), 313–320 (1998)
MathSciNet MATH Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier (Grenoble) 48(3), 769–783 (1998)
MathSciNet MATH Google Scholar
Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26, 57–95 (2016)
MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1–34 (2018)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, pp. 87–89 (1963)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–325 (2015)
MathSciNet MATH Google Scholar
Nesterov, Y.: A method for solving the convex programming problem with convergence rate $O(1/k^2)$. Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983). (Russian)
MathSciNet Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Dordrecht (2004)
MATH Google Scholar
Ochs, P.: Local convergence of the heavy-ball method and ipiano for non-convex optimization. J. Optim. Theory Appl. 177(1), 153–180 (2018)
MathSciNet MATH Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springe, Berlin (1998)
Google Scholar
Simon, L.: Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems. Ann. Math. 118(3), 525–571 (1983)
MathSciNet MATH Google Scholar
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)
MathSciNet MATH Google Scholar
Sun, T., Yin, P., Li, D., Huang, C., Guan, L., Jiang, H.: Non-ergodic convergence analysis of heavy-ball algorithms. In: The Thirty-Third AAAI Conference on Artificial Intelligence (2019)
Zavriev, S.K., Kostyuk, F.V.: Heavy-ball method in non-convex optimization problems. Comput. Math. Model. 4, 336–341 (1993)
MATH Google Scholar

Download references

Acknowledgements

The author is thankful to two anonymous referees for their valuable remarks and suggestions which led to the improvement of the quality of the paper.

Author information

Authors and Affiliations

Department of Mathematics, Technical University of Cluj-Napoca, Str. Memorandumului nr. 28, 400114, Cluj-Napoca, Romania
Szilárd Csaba László

Authors

Szilárd Csaba László
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szilárd Csaba László.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P1-1.1-TE-2016-0266, and by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P4-ID-PCE-2016-0190, within PNCDI III.

Appendix

1.1 Second order continuous dynamical systems that are modelling Algorithm (2)

In what follows we emphasize the connections between Algorithm (2) and the continuous dynamical systems (3) and (4).

Consider (4) with the initial conditions $x(t_0)=u_0,\,{\dot{x}}(t_0)=v_0,\,u_0,v_0\in {\mathbb {R}}^m$ and the governing second order differential equation

$$\begin{aligned} \ddot{x}(t)+\left( \gamma +\frac{\alpha }{t}\right) {\dot{x}}(t)+{\nabla }g(x(t))=0,\,\gamma >0,\,\alpha \in {\mathbb {R}}. \end{aligned}$$

We will use the time discretization presented in [6], that is, we take the fixed stepsize $h> 0,$ and consider $\beta =1-\gamma h>0$, $t_n = \frac{1}{\beta } nh$ and $x_n = x(t_n).$ Then the implicit/explicit discretization of (3) leads to

$$\begin{aligned} \frac{1}{h^2}(x_{n+1}-2x_n+x_{n-1})+\left( \frac{\gamma }{h}+\frac{\alpha \beta }{nh^2}\right) (x_n-x_{n-1})+{\nabla }g(y_n)=0, \end{aligned}$$

(55)

where $y_n$ is a linear combination of $x_n$ and $x_{n-1}$ and will be defined below.

Now, (55) can be rewritten as

$$\begin{aligned} x_{n+1}= x_n+\left( \beta -\frac{\alpha \beta }{ n}\right) (x_n-x_{n-1})-h^2{\nabla }g(y_n), \end{aligned}$$

which suggest to choose $y_n$ in the form

$$\begin{aligned} y_n=x_n+\left( \beta -\frac{\alpha \beta }{ n}\right) (x_n-x_{n-1}). \end{aligned}$$

However, for practical purposes, it is convenient to work with the re-indexation $n\rightarrowtail n+\alpha $ and we obtain the following equivalent formulation

$$\begin{aligned} y_n=x_n+\frac{\beta n}{ n+\alpha }(x_n-x_{n-1}). \end{aligned}$$

Hence, by taking $h^2=s$ we get

$$\begin{aligned} x_{n+1}= x_n+\frac{\beta n}{ n+\alpha }(x_n-x_{n-1})-s{\nabla }g(y_n), \end{aligned}$$

which is exactly Algorithm (2).

Remark 25

Obviously, already the form $\beta =1-\gamma h>0$ shows that $\beta \in (0,1).$ We could not obtain Algorithm (2) via some similar discretization of the continuous dynamical system (3) as the discretization method presented above. Nevertheless, we can show that (3) is the exact limit of Algorithm (2) in the sense of Su, Boyd and Candès [50].

In what follows we show that by choosing appropriate values of $\beta $, both the continuous second order dynamical systems (3) and the continuous dynamical system (4) are the exact limit of the numerical scheme (2).

To this end we take in (2) small step sizes and follow the same approach as Su, Boyd and Candès in [50], (see also [27] for similar approaches). For this purpose we rewrite (2) in the form

$$\begin{aligned} \frac{x_{n+1}-x_n}{\sqrt{s}}=\frac{\beta n}{n+\alpha }\cdot \frac{x_n-x_{n-1}}{\sqrt{s}}-\sqrt{s}{\nabla }g(y_n) \ \forall n \ge 1 \end{aligned}$$

(56)

and introduce the Ansatz $x_n\approx x(n\sqrt{s})$ for some twice continuously differentiable function $x : [0,+\infty ) \rightarrow {\mathbb {R}}^m$. We let $n=\frac{t}{\sqrt{s}}$ and get $x(t)\approx x_n,\,x(t+\sqrt{s})\approx x_{n+1},\,x(t-\sqrt{s})\approx x_{n-1}.$ Then, as the step size s goes to zero, from the Taylor expansion of x we obtain

$$\begin{aligned} \frac{x_{n+1}-x_n}{\sqrt{s}}={\dot{x}}(t)+\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s}) \end{aligned}$$

and

$$\begin{aligned} \frac{x_n-x_{n-1}}{\sqrt{s}}={\dot{x}}(t)-\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s}). \end{aligned}$$

Further, since

$$\begin{aligned} \sqrt{s}\Vert {\nabla }g(y_n)-{\nabla }g(x_n)\Vert \le \sqrt{s} L_g\Vert y_n-x_n\Vert =\sqrt{s} L_g\left| \frac{\beta n}{n+\alpha }\right| \Vert x_n-x_{n-1}\Vert =o(\sqrt{s}), \end{aligned}$$

it follows $\sqrt{s} {\nabla }g(y_n)=\sqrt{s}{\nabla }g(x_n)+ o(\sqrt{s})$. Consequently, (56) can be written as

$$\begin{aligned}&{\dot{x}}(t)+\frac{1}{2}\ddot{x}(t)\sqrt{s}+ o(\sqrt{s})\\&\quad = \frac{\beta t}{t+\alpha \sqrt{s}}\left( {\dot{x}}(t)-\frac{1}{2}\ddot{x}(t)\sqrt{s}+ o(\sqrt{s})\right) -\sqrt{s}{\nabla }g(x(t))+ o(\sqrt{s}) \end{aligned}$$

or, equivalently

$$\begin{aligned}&(t+\alpha \sqrt{s})\left( {\dot{x}}(t)+\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s})\right) \\&\quad = \beta t\left( {\dot{x}}(t)-\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s})\right) -\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))+o(\sqrt{s}). \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{2}\left( \alpha \sqrt{s}+(1+\beta )t\right) \ddot{x}(t)\sqrt{s}+\left( (1-\beta )t+\alpha \sqrt{s}\right) {\dot{x}}(t)+\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))=o(\sqrt{s}).\nonumber \\ \end{aligned}$$

(57)

Now, if we take $\beta =1-\gamma {s}<1$ in (57) for some $\frac{1}{{s}}>\gamma >0$, we obtain

$$\begin{aligned} \frac{1}{2}\left( \alpha \sqrt{s}+(2-\gamma {s})t\right) \ddot{x}(t)\sqrt{s}+\left( \gamma {s}t+\alpha \sqrt{s}\right) {\dot{x}}(t)+\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))=o(\sqrt{s}). \end{aligned}$$

After dividing by $\sqrt{s}$ and letting $s\rightarrow 0$, we obtain

$$\begin{aligned} t\ddot{x}(t)+\alpha {\dot{x}}(t)+t{\nabla }g(x(t))=0, \end{aligned}$$

which after division by t gives (3), that is,

$$\begin{aligned} \ddot{x}(t)+\frac{\alpha }{t}{\dot{x}}(t)+{\nabla }g(x(t))=0. \end{aligned}$$

Similarly, by taking $\beta =1-\gamma \sqrt{s}<1$ in (57), for some $\frac{1}{\sqrt{s}}>\gamma >0$, we obtain

$$\begin{aligned} \frac{1}{2}\left( \alpha \sqrt{s}+(2-\gamma \sqrt{s})t\right) \ddot{x}(t)\sqrt{s}+\left( \gamma \sqrt{s}t+\alpha \sqrt{s}\right) {\dot{x}}(t)+\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))=o(\sqrt{s}). \end{aligned}$$

After dividing by $\sqrt{s}$ and letting $s\rightarrow 0$, we get

$$\begin{aligned} t\ddot{x}(t)+(\gamma t+\alpha ){\dot{x}}(t)+t{\nabla }g(x(t))=0, \end{aligned}$$

which after division by t gives (4), that is,

$$\begin{aligned} \ddot{x}(t)+\left( \gamma +\frac{\alpha }{t}\right) {\dot{x}}(t)+{\nabla }g(x(t))=0. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

László, S.C. Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math. Program. 190, 285–329 (2021). https://doi.org/10.1007/s10107-020-01534-w

Download citation

Received: 19 July 2018
Accepted: 16 June 2020
Published: 01 July 2020
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10107-020-01534-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization

Abstract

Access this article

Similar content being viewed by others

Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions

A gradient-type algorithm with backward inertial steps associated to a nonconvex minimization problem

First-order optimization algorithms via inertial systems with Hessian driven damping

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Second order continuous dynamical systems that are modelling Algorithm (2)

Remark 25

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization

Abstract

Access this article

Similar content being viewed by others

Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions

A gradient-type algorithm with backward inertial steps associated to a nonconvex minimization problem

First-order optimization algorithms via inertial systems with Hessian driven damping

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Second order continuous dynamical systems that are modelling Algorithm (2)

Remark 25

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation