Abstract
In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dampings. The geometrical damping driven by the Hessian intervenes in the dynamics in the form \(\nabla ^2 f (x(t)) \dot{x} (t)\). By treating this term as the time derivative of \( \nabla f (x (t)) \), this gives, in discretized form, first-order algorithms in time and space. In addition to the convergence properties attached to Nesterov-type accelerated gradient methods, the algorithms thus obtained are new and show a rapid convergence towards zero of the gradients. On the basis of a regularization technique using the Moreau envelope, we extend these methods to non-smooth convex functions with extended real values. The introduction of time scale factors makes it possible to further accelerate these algorithms. We also report numerical results on structured problems to support our theoretical findings.
Similar content being viewed by others
Notes
One can even consider the more general case \(b(t)=1+b/(hk), b > 0\) for which our discussion remains true under minor modifications. But we do not pursue this for the sake of simplicity.
References
Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81(8), 747–779 (2002)
Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule. Math. Program. Ser. B. 180, 137–156 (2020)
Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263, 5412–5458 (2017)
Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28(1), 849–874 (2018)
Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Rate of convergence of inertial gradient dynamics with time-dependent viscous damping coefficient. Evol. Equ. Control Theory 7(3), 353–371 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Fast proximal methods via time scaling of damped inertial dynamics. SIAM J. Optim. 29(3), 2227–2256 (2019)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B. 168, 123–175 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\). ESAIM Control Optim. Calc. Var. 25, 2–35 (2019)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \(1/k^2\). SIAM J. Optim. 26(3), 1824–1834 (2016)
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Diffe. Equ. 261(10), 5734–5783 (2016)
Attouch, H., Svaiter, B. F.: A continuous dynamical Newton-Like approach to solving monotone inclusions. SIAM J. Control Optim. 49(2), 574–598 (2011). Global convergence of a closed-loop regularized Newton method for solving monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 157(3), 624–650 (2013)
Aujol, J.-F., Dossal, Ch.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)
Aujol, J.-F., Dossal, C.: Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for \(b>0\) (2017). https://hal.inria.fr/hal-01547251v2
Bateman, H.: Higher Transcendental Functions, vol. 1. McGraw-Hill, New York (1953)
Bauschke, H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972)
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Gelfand, I.M., Zejtlin, M.: Printszip nelokalnogo poiska v sistemah avtomatich, Optimizatsii, Dokl. AN SSSR, 137, 295?298 (1961) (in Russian)
May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Doklady 27, 372–376 (1983)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 152(1–2), 381–404 (2015)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4, 1–17 (1964)
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Siegel, W.: Accelerated first-order methods: differential equations and Lyapunov functions. arXiv:1903.05671v1 [math.OC] (2019)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. arXiv:submit/2440124 [cs.LG] 21 Oct 2018
Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. NIPS’14 27, 2510–2518 (2014)
Wilson, A.C., Recht, B., Jordan, M.I.: A Lyapunov analysis of momentum methods in optimization. arXiv:1611.02635 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Auxiliary results
Auxiliary results
1.1 Extended descent lemma
Lemma 1
Let \(f: {{\mathcal {H}}}\rightarrow {\mathbb R}\) be a convex function whose gradient is L-Lipschitz continuous. Let \(s \in ]0,1/L]\). Then for all \((x,y) \in {{\mathcal {H}}}^2\), we have
Proof
Denote \(y^+=y - s \nabla f (y)\). By the standard descent lemma applied to \(y^+\) and y, and since \(sL \le 1\) we have
We now argue by duality between strong convexity and Lipschitz continuity of the gradient of a convex function. Indeed, using Fenchel identity, we have
L-Lipschitz continuity of the gradient of f is equivalent to 1/L-strong convexity of its conjugate \(f^*\). This together with the fact that \((\nabla f)^{-1}=\partial f^*\) gives for all \((x,y) \in {{\mathcal {H}}}^2\),
Inserting this inequality into the Fenchel identity above yields
Inserting the last bound into (29) completes the proof. \(\square \)
1.2 Proof of (27)
Proof
We have
By the Pythagoras relation, we then get
\(\square \)
1.3 Closed-form solutions of \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) for quadratic functions
We here provide the closed form solutions to \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) for the quadratic objective \(f: {\mathbb R}^n \rightarrow \langle Ax,\,x \rangle \), where A is a symmetric positive definite matrix. The case of a semidefinite positive matrix A can be treated similarly by restricting the analysis to \(\ker (A)^\top \). Projecting \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) on the eigenspace of A, one has to solve n independent one-dimensional ODEs of the form
where \(\lambda _i > 0\) is an eigenvalue of A. In the following, we drop the subscript i.
Case \(\varvec{\beta (t) \equiv \beta , b(t)=b+\gamma /t, \beta \ge 0, b > 0, \gamma \ge 0}\): The ODE reads
-
If \(\beta ^2\lambda ^2 \ne 4b\lambda \): set
$$\begin{aligned} \xi = \sqrt{\beta ^2\lambda ^2 - 4b\lambda }, \, \kappa = \lambda \frac{\gamma -\alpha \beta /2}{\xi }, \, \sigma = (\alpha -1)/2 . \end{aligned}$$Using the relationship between the Whitaker functions and the Kummer’s confluent hypergeometric functions M and U, see [16], the solution to (30) can be shown to take the form
$$\begin{aligned} x(t) = \xi ^{\alpha /2} e^{-(\beta \lambda +\xi )t/2}\left[ {c_1 M(\alpha /2-\kappa ,\alpha ,\xi t) + c_2 U(\alpha /2-\kappa ,\alpha ,\xi t)}\right] , \end{aligned}$$where \(c_1\) and \(c_2\) are constants given by the initial conditions.
-
If \(\beta ^2\lambda ^2 = 4b\lambda \): set \(\zeta =2\sqrt{\lambda \left( {\gamma -\alpha \beta /2}\right) }\). The solution to (30) takes the form
$$\begin{aligned} x(t) = t^{-\left( {\alpha -1}\right) /2}e^{-\beta \lambda t/2}\left[ {c_1 J_{(\alpha -1)/2}(\zeta \sqrt{t}) + c_2 Y_{(\alpha -1)/2}(\zeta \sqrt{t})}\right] , \end{aligned}$$where \(J_\nu \) and \(Y_\nu \) are the Bessel functions of the first and second kind.
When \(\beta > 0\), one can clearly see the exponential decrease forced by the Hessian. From the asymptotic expansions of M, U, \(J_{\nu }\) and \(Y_{\nu }\) for large t, straightforward computations provide the behaviour of |x(t)| for large t as follows:
-
If \(\beta ^2\lambda ^2 > 4b\lambda \), we have
$$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{\alpha }{2}+|\kappa |} e^{-\frac{\beta \lambda -\xi }{2}t}}\right) = {{\mathcal {O}}}\left( {e^{-\frac{2b}{\beta }t - \left( {\frac{\alpha }{2}-|\kappa |}\right) \log (t)}}\right) . \end{aligned}$$ -
If \(\beta ^2\lambda ^2 < 4b\lambda \), whence \(\xi \in i {\mathbb R}^+_*\) and \(\kappa \in i {\mathbb R}\), we have
$$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{\alpha }{2}} e^{-\frac{\beta \lambda }{2}t}}\right) . \end{aligned}$$ -
If \(\beta ^2\lambda ^2 = 4b\lambda \), we have
$$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{2\alpha -1}{4}} e^{-\frac{\beta \lambda }{2}t}}\right) . \end{aligned}$$
Case \(\varvec{\beta (t) = t^{\beta }, b(t)=ct^{\beta -1}, \beta \ge 0, c > 0}\): The ODE reads now
Let us make the change of variable \(t :=\tau ^{\frac{1}{\beta +1}}\). Let \(y(\tau ) :=x\left( {\tau ^{\frac{1}{\beta +1}}}\right) \). By the standard derivation chain rule, it is straightforward to show that y obeys the ODE
It is clear that this is a special case of (30). Since \(\beta \) and \(\lambda > 0\), set
It follows from the first case above that
Asymptotic estimates can also be derived similarly to above. We omit the details for the sake of brevity.
Rights and permissions
About this article
Cite this article
Attouch, H., Chbani, Z., Fadili, J. et al. First-order optimization algorithms via inertial systems with Hessian driven damping. Math. Program. 193, 113–155 (2022). https://doi.org/10.1007/s10107-020-01591-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-020-01591-1
Keywords
- Hessian driven damping
- Inertial optimization algorithms
- Nesterov accelerated gradient method
- Ravine method
- Time rescaling