A frequency-domain analysis of inexact gradient methods

Gannot, Oran

doi:10.1007/s10107-021-01665-8

A frequency-domain analysis of inexact gradient methods

Full Length Paper
Series A
Published: 24 May 2021

Volume 194, pages 975–1016, (2022)
Cite this article

Mathematical Programming Submit manuscript

Oran Gannot¹

827 Accesses
9 Citations
Explore all metrics

Abstract

We study robustness properties of some iterative gradient-based methods for strongly convex functions, as well as for the larger class of functions with sector-bounded gradients, under a relative error model. Proofs of the corresponding convergence rates are based on frequency-domain criteria for the stability of nonlinear systems. Applications are given to inexact versions of gradient descent and the Triple Momentum Method. To further emphasize the usefulness of frequency-domain methods, we derive improved analytic bounds for the convergence rate of Nesterov’s accelerated method (in the exact setting) on strongly convex functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A family of optimal weighted conjugate-gradient-type methods for strictly convex quadratic minimization

Article 25 November 2021

Harry Oviedo, Roberto Andreani & Marcos Raydan

Inexact Reduced Gradient Methods in Nonconvex Optimization

Article 19 October 2023

Pham Duy Khanh, Boris S. Mordukhovich & Dat Ba Tran

Quadratic regularization methods with finite-difference gradient approximations

Article 18 May 2022

Geovani Nunes Grapiglia

Notes

For the reader familiar with the control-theoretic absolute stability problem, the convergence result quoted above shows that gradient descent satisfies a version of the Aizerman conjecture.
In the language of the absolute stability problem, inexact gradient descent satisfies a version of the Kalman conjecture.
In fact, the choice of parameters (1.15) was originally motivated by a frequency-domain analysis; for more, see §4 and [61].
The reader should ignore the statement about the finiteness of \(V_\mathrm {f}\), which is famously false [63].
[52] explicitly states that A should not have spectrum with \(|z| = \rho \), but this is unnecessary. Such an assumption can always be removed when (A, B) is controllable by shifting the eigenvalues off of \(|z| = \rho \) via linear feedback: an equivalent characterization of controllability is that for any monic polynomial there is a matrix N such that the characteristic polynomial of \(A + BN\) is the given polynomial [50, §34, Theorem 1]. The corresponding linear change of variables \((x, u) \mapsto (x, u + Nx)\) does not affect the validity of (FDI) (it corresponds to a congruency transformation of the Popov function by a rational matrix function), and leaves invariant any solution of (LMI).
In general, this is not true of every solution to (LMI).
In reference to the remarks preceding Lemma 2.7, note that \(\Pi ({{\bar{z}}},z) \equiv 0\) when \(|z| = \rho _\mathrm {GD}\) and \(\alpha = 2/(m+L)\).

References

Ahmad, N.S., Carrasco, J., Heath, W.P.: A less conservative lmi condition for stability of discrete-time systems with slope-restricted nonlinearities. IEEE Trans. Autom. Control 60(6), 1692–1697 (2014)
Article MathSciNet Google Scholar
Arora, S., Ge, R., Ma, T., Moitra, A.: Simple, efficient, and neural algorithms for sparse coding. In: Conference on Learning Theory, pp. 113–149. PMLR (2015)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific Belmont, Massachusets (1999)
MATH Google Scholar
Boczar, R., Lessard, L., Recht, B.: Exponential convergence bounds using integral quadratic constraints. In: 2015 54th IEEE Conference on Decision and Control (CDC), pp. 7516–7521. IEEE (2015)
Badithela, A., Seiler, P.: Analysis of the heavy-ball algorithm using integral quadratic constraints. In: 2019 American Control Conference (ACC), pp. 4081–4085. IEEE (2019)
Chen, Y., Candes, E.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. In: Advances in Neural Information Processing Systems, pp. 739–747 (2015)
Cohen, M.B., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. arXiv:1805.12591 (2018)
Cyrus, S., Hu, B., Van Scoy, B., Lessard, L.: A robust accelerated optimization algorithm for strongly convex functions. In: 2018 Annual American Control Conference (ACC), pp. 1376–1381. IEEE (2018)
Candes, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Article MathSciNet Google Scholar
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)
Article MathSciNet Google Scholar
Drummond, R., Duncan, S.: Accelerated gradient methods with memory. arXiv:1805.09077 (2018)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case. CORE discussion papers, 2013016 (2013)
Devolder, O., Glineur, F., Nesterov, Y.: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)
Article MathSciNet Google Scholar
de Klerk, E., Glineur, F., Taylor, A.: Worst-case convergence analysis of gradient and Newton methods through semidefinite programming performance estimation. arXiv:1709.05191 (2017)
de Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)
Article MathSciNet Google Scholar
Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184, 183–220 (2020). https://doi.org/10.1007/s10107-019-01410-2
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)
Article MathSciNet Google Scholar
França, G., Bento, J.: An explicit rate bound for over-relaxed ADMM. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2104–2108. IEEE (2016)
Fazlyab, M., Ribeiro, A., Morari, M., Preciado, V.M.: Analysis of optimization algorithms via integral quadratic constraints: nonstrongly convex problems. SIAM J. Optim. 28(3), 2654–2689 (2018)
Article MathSciNet Google Scholar
Fetzer, M., Scherer, C.W.: Absolute stability analysis of discrete time feedback interconnections. IFAC-PapersOnLine 50(1), 8447–8453 (2017)
Article Google Scholar
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380–A1405 (2012)
Article MathSciNet Google Scholar
Gannot, O.: Frequency criteria for exponential stability. arXiv:1910.10855 (2019)
Gramlich, D., Ebenbauer, C., Scherer, C.W.: Convex synthesis of accelerated gradient algorithms for optimization and saddle point problems using Lyapunov functions. arXiv:2006.09946 (2020)
Hu, B., Lessard, L.: Control interpretations for first-order optimization methods. In: 2017 American Control Conference (ACC), pp. 3114–3119. IEEE (2017)
Hu, B., Lessard, L.: Dissipativity theory for nesterov’s accelerated method. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1549–1557 (2017)
Hassan-Moghaddam, S., Jovanović, M.R: Proximal gradient flow and Douglas–Rachford splitting dynamics: global exponential stability via integral quadratic constraints. arXiv:1908.09043 (2019)
Hu, B., Seiler, P.: Exponential decay rate conditions for uncertain linear systems using integral quadratic constraints. IEEE Trans. Autom. Control 61(11), 3631–3637 (2016)
Article MathSciNet Google Scholar
Hu, B., Syed, U.A.: Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory. arXiv:1906.06781 (2019)
Hu, B., Seiler, P., Lessard, L.: Analysis of approximate stochastic gradient using quadratic constraints and sequential semidefinite programs. arXiv:1711.00987 (2017)
Hu, B., Seiler, P., Lessard, L.: Analysis of biased stochastic gradient descent using sequential semidefinite programs. Math. Program., 1–26 (2020)
Hu, B., Wright, S., Lessard, L.: Dissipativity theory for accelerating stochastic variance reduction: a unified analysis of SVRG and Katyusha using semidefinite programs. arXiv:1806.03677 (2018)
Jury, E.I., Lee, B.W.: A stability theory for multinonlinear control systems. Third IFAC World Congr. 28, A1–A11 (1966)
Google Scholar
Jury, E.L., Lee, B.: On the stability of a certain class of nonlinear sampled-data systems. IEEE Trans. Autom. Control 9(1), 51–61 (1964)
Article MathSciNet Google Scholar
Kalman, R.E.: Lectures on controllability and observability. In: Controllability and Observability, pp. 1–149. Springer (2010)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)
Article MathSciNet Google Scholar
Kim, D., Fessler, J.A.: On the convergence analysis of the optimized gradient method. J. Optim. Theory Appl. 172(1), 187–205 (2017)
Article MathSciNet Google Scholar
Kleinberg, R., Li, Y., Yuan, Y.: An alternative view: When does SGD escape local minima? arXiv:1802.06175 (2018)
Kalman, R.E., Szegoö, G.: Sur la stabilité absolue d’un système d’équations aux diffèrences finies. C R Acad Sci Paris
Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)
Article MathSciNet Google Scholar
Lessard, L., Seiler, P.: Direct synthesis of iterative algorithms with bounds on achievable worst-case convergence rate. arXiv:1904.09046 (2019)
Lurie, A.I.: Some Nonlinear Problems in the Theory of Automatic Control. HM Stationary Office, London (1957)
Google Scholar
Li, Y., Yuan, Y.: Convergence analysis of two-layer neural networks with ReLU activation. arXiv:1705.09886 (2017)
Michalowsky, S.: Design of Distributed and Robust Optimization Algorithms. Logos Verlag Berlin GmbH, A Systems Theoretic Approach (2020)
Megretski, A., Rantzer, A.: System analysis via integral quadratic constraints. IEEE Trans. Autom. Control 42(6), 819–830 (1997)
Article MathSciNet Google Scholar
Michalowsky, S., Scherer, C., Ebenbauer, C.: Robust and structure exploiting optimization algorithms: an integral quadratic constraint approach. arXiv:1905.00279 (2019)
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2018)
Book Google Scholar
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.I.: A general analysis of the convergence of ADMM. arXiv:1502.02009 (2015)
O’Shea, R., Younis, M.: A frequency-time domain stability criterion for sampled-data systems. IEEE Trans. Autom. Control 12(6), 719–724 (1967)
Article Google Scholar
Popov, V.M., Georgescu, R.: Hyperstability of Control Systems. Springer, Berlin (1973)
Book Google Scholar
Polyak, B.T.: Introduction to Optimization, vol. 1. Publications Division Inc., New York (1987)
MATH Google Scholar
Rantzer, A.: On the kalman–Yakubovich–Popov lemma. Syst. Control Lett. 28(1), 7–10 (1996)
Article MathSciNet Google Scholar
Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. arXiv:1812.00146 (2018)
Safavi, S., Joshi, B., França, G., Bento, J.: An explicit convergence rate for nesterov’s method from SDP. In: 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1560–1564. IEEE (2018)
Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)
Article MathSciNet Google Scholar
Taylor, A., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. arXiv:1902.00947 (2019)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)
Article MathSciNet Google Scholar
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)
Article MathSciNet Google Scholar
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. J. Optim. Theory Appl. 178(2), 455–476 (2018)
Article MathSciNet Google Scholar
Taylor, A., Van Scoy, B., Lessard, L.: Lyapunov functions for first-order methods: tight automated convergence guarantees. arXiv:1803.06073 (2018)
Van Scoy, B., Freeman, R.A., Lynch, K.M.: The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Cont. Syst. Lett. 2(1), 49–54 (2017)
Article MathSciNet Google Scholar
Willems, J.: Least squares stationary optimal control and the algebraic Riccati equation. IEEE Trans. Autom. Control 16(6), 621–634 (1971)
Article MathSciNet Google Scholar
Willems, J.: On the existence of a nonpositive solution to the Riccati equation. IEEE Trans. Autom. Control 19(5), 592–593 (1974)
Article MathSciNet Google Scholar
Willems, J.C.: Dissipative dynamical systems part I: general theory. Arch. Ration. Mech. Anal. 45(5), 321–351 (1972)
Article Google Scholar
Willems, J.C.: Dissipative dynamical systems part II: linear systems with quadratic supply rates. Arch. Ration. Mech. Anal. 45(5), 352–393 (1972)
Article Google Scholar
Willems, J.C., Polderman, J.W.: Introduction to Mathematical Systems Theory: A Behavioral Approach, vol. 26. Springer, Berlin (1997)
MATH Google Scholar
Xiong, H., Chi, Y., Bin, H., Zhang, W.: Analytical convergence regions of accelerated gradient descent in nonconvex optimization under regularity condition. Automatica 113, 108715 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author is indebted to both anonymous referees for their close readings and numerous suggestions on how to improve the paper. Special thanks are due to the first referee, who pointed out that a proof of Proposition 1.5 can already be inferred from [dKGT1].

Author information

Authors and Affiliations

Berkeley, USA
Oran Gannot

Authors

Oran Gannot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oran Gannot.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gannot, O. A frequency-domain analysis of inexact gradient methods. Math. Program. 194, 975–1016 (2022). https://doi.org/10.1007/s10107-021-01665-8

Download citation

Received: 26 January 2020
Accepted: 03 May 2021
Published: 24 May 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s10107-021-01665-8

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A frequency-domain analysis of inexact gradient methods

Abstract

Access this article

Similar content being viewed by others

A family of optimal weighted conjugate-gradient-type methods for strictly convex quadratic minimization

Inexact Reduced Gradient Methods in Nonconvex Optimization

Quadratic regularization methods with finite-difference gradient approximations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

A frequency-domain analysis of inexact gradient methods

Abstract

Access this article

Similar content being viewed by others

A family of optimal weighted conjugate-gradient-type methods for strictly convex quadratic minimization

Inexact Reduced Gradient Methods in Nonconvex Optimization

Quadratic regularization methods with finite-difference gradient approximations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation