A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

Liang, Jiaming; Monteiro, Renato D. C.; Sim, Chee-Khian

doi:10.1007/s10589-021-00280-9

A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

Published: 13 May 2021

Volume 79, pages 649–679, (2021)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

849 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we describe and establish iteration-complexity of two accelerated composite gradient (ACG) variants to solve a smooth nonconvex composite optimization problem whose objective function is the sum of a nonconvex differentiable function f with a Lipschitz continuous gradient and a simple nonsmooth closed convex function h. When f is convex, the first ACG variant reduces to the well-known FISTA for a specific choice of the input, and hence the first one can be viewed as a natural extension of the latter one to the nonconvex setting. The first variant requires an input pair (M, m) such that f is m-weakly convex, $\nabla f$ is M-Lipschitz continuous, and $m \le M$ (possibly $m<M$), which is usually hard to obtain or poorly estimated. The second variant on the other hand can start from an arbitrary input pair (M, m) of positive scalars and its complexity is shown to be not worse, and better in some cases, than that of the first variant for a large range of the input pairs. Finally, numerical results are provided to illustrate the efficiency of the two ACG variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated inexact composite gradient methods for nonconvex spectral optimization problems

Article 28 May 2022

Weiwei Kong & Renato D. C. Monteiro

Inexact proximal stochastic gradient method for convex composite optimization

Article 11 August 2017

Xiao Wang, Shuxiong Wang & Hongchao Zhang

A Proximal Augmented Lagrangian Method for Linearly Constrained Nonconvex Composite Optimization Problems

Article 26 April 2023

Jefferson G. Melo, Renato D. C. Monteiro & Hairong Wang

Notes

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Becker, S., Fadili, J.: A quasi-newton proximal splitting method. Adv. Neural Inf. Process. Syst. 25, 2618–2626 (2012)
Google Scholar
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)
Article MathSciNet Google Scholar
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Article MathSciNet Google Scholar
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178, 503–558 (2018)
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)
Article MathSciNet Google Scholar
Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In international conference on machine learning, pages 37–45. PMLR, (2013)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Article MathSciNet Google Scholar
He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25, 2182–2211 (2015)
Article MathSciNet Google Scholar
He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26, 29–56 (2016)
Article MathSciNet Google Scholar
Kim, J., Park, H.: Toward faster nonnegative matrix factorization: a new algorithm and comparisons. In 2008 Eighth IEEE International Conference on Data Mining, pages 353–362. IEEE (2008)
Kolossoski, O., Monteiro, R.D.C.: An accelerated non-Euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32, 1244–1272 (2017)
Article MathSciNet Google Scholar
Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)
Article MathSciNet Google Scholar
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with $\cal{O}(1/\epsilon )$ iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)
Article MathSciNet Google Scholar
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 379–387, (2015)
Li, Q., Zhou, Y., Liang, Y., Varshney, P.K. : Convergence analysis of proximal gradient with momentum for nonconvex optimization. In International Conference on Machine Learning, pages 2111–2119. PMLR, (2017)
Liang, J., Monteiro, R.D.C.: A doubly accelerated inexact proximal point method for nonconvex composite optimization problems. Available on arXiv:1811.11378 (2018)
Liang, J., Monteiro, R.D.C.: An average curvature accelerated composite gradient method for nonconvex smooth composite optimization problems. SIAM J. Optim. 31(1), 217–243 (2021)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O$(1/k^2)$. Doklady AN SSSR 269, 543–547 (1983)
Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.E.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet Google Scholar
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Article MathSciNet Google Scholar
Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst acceleration for gradient-based non-convex optimization. In A. Storkey and F. Perez-Cruz, editors, Proceedings of Machine Learning Research: International Conference on Artificial Intelligence and Statistics, 84, 613–622 (2018)
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Conv. Anal. 19(4), 1167–1192 (2012)
MathSciNet MATH Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. http://www.mit.edu/$~$dimitrib/PTseng/papers.html, (2008)
Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Machine Learn. Res. 18, 179 (2017)
MathSciNet MATH Google Scholar
Yao, Q., Kwok, J. T., Gao, F., Chen, W., Liu, T.-Y.: Efficient inexact proximal gradient algorithm for nonconvex problems. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 3308–3314. IJCAI, (2017)

Download references

Acknowledgements

We are grateful to Guanghui Lan and Saeed Ghadimi for sharing the source code of the UPFAG method in [7]. We are also grateful to the two anonymous referees and the associate editor for their insightful comments which we have used to substantially improve the quality of this work.

Author information

Authors and Affiliations

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0205, USA
Jiaming Liang & Renato D. C. Monteiro
School of Mathematics and Physics, University of Portsmouth, Lion Gate Building, Lion Terrace, Portsmouth, PO1 3HF, United Kingdom
Chee-Khian Sim

Authors

Jiaming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Renato D. C. Monteiro
View author publications
You can also search for this author in PubMed Google Scholar
Chee-Khian Sim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaming Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

R. D. C. Monteiro: This work was partially supported by ONR Grant N00014-18-1-2077.

C.-K. Sim: This work is made possible through an LMS Research in Pairs (Scheme 4) grant.

Supplementary results

This section provides a bound on the quantity $ \min _{0 \le i \le k-1} \Vert y_{i+1}-\tilde{x}_i\Vert ^2 $ for the case in which the parameter $m_0$ of the ADAP-NC-FISTA satisfies $ m_0\ge {\bar{m}} $. Note that an alternative bound on this quantity has already been developed in Proposition 3.3 for any $m_0>0$.

Proposition A.1

For every $ k\ge 1 $, for $ m_0\ge {\bar{m}} $, we have

$$\begin{aligned} \frac{1}{10}\left( \sum _{i=0}^{k-1}A_{i+1}\right) \min _{0 \le i \le k-1} \Vert y_{i+1} - {\tilde{x}}_i \Vert ^2\le & {} 2{\lambda }_0 A_0 \left( \phi \left( y_0\right) -\phi _*\right) +\Vert x_0-x^*\Vert ^2\\&+{\lambda }_0D_h^2 \left( 2m_0+2m_0 k+ {\bar{m}}\sum _{i=0}^{k-1}a_i\right) . \end{aligned}$$

Proof

Using the assumption of the lemma that $ m_0\ge {\bar{m}} $, the facts that $a_i \ge 2$ for $ i\ge 0 $ from the fourth remark following SUB$ (\theta ,{\lambda },m) $, and $ \{{\lambda }_i\} $ is non-increasing from Lemma 3.1(d), we have

$$\begin{aligned} \left( {\bar{m}}+\frac{2m_0}{a_i}\right) {\lambda }_{i+1}\le m_0 \left( 1 +\frac{2}{a_i}\right) {\lambda }_{i+1} \le 2m_0{\lambda }_i. \end{aligned}$$

(63)

The above inequality implies that (34) is always satisfied with $ m=m_0 $ and $ {\lambda }={\lambda }_{i+1} $. Hence, $ m_k $ is never updated in SUB$ (\theta ,{\lambda },m) $, i.e., $ m_i=m_0 $, for $ i\ge 0 $. Using similar arguments as in the proof of Lemma 2.3, we conclude that for every $i \ge 0$ and $u \in \Omega $,

$$\begin{aligned}&2{\lambda }_{i+1} A_{i+1} \phi \left( y_{i+1}\right) + \left( 2m_0{\lambda }_{i+1}+1\right) \Vert u - x_{i+1} \Vert ^2 + \left( 1-{\lambda }_{i+1} {{{\mathcal {C}}}}_{i+1}\right) A_{i+1} \Vert y_{i+1} - {\tilde{x}}_i \Vert ^2 \nonumber \\&\quad \le 2{\lambda }_{i+1} A_i \gamma _i\left( y_i\right) + 2{\lambda }_{i+1} a_i \gamma _i(u) + \Vert u - x_i \Vert ^2, \end{aligned}$$

(64)

where

$$\begin{aligned} \gamma _i(u) := \tilde{\gamma }_i\left( y_{i+1}\right) + \frac{1}{{\lambda }_{i+1}}\langle {\tilde{x}}_i - y_{i+1}, u - y_{i+1} \rangle + \frac{m_0}{a_i} \Vert u - y_{i+1} \Vert ^2 \end{aligned}$$

and

$$\begin{aligned} \tilde{\gamma }_i(u) := \ell _f\left( u;{\tilde{x}}_i\right) + h(u) + \frac{m_0}{a_i} \Vert u - {\tilde{x}}_i \Vert ^2. \end{aligned}$$

(65)

As in Lemma 2.2(a), we have $ \gamma _i(u)\le \tilde{\gamma }_i(u) $ for every $ u\in \mathrm {dom}\,h $. Hence, it follows from (65) and (4) that for every $ k\ge 0 $ and $ u \in \mathrm {dom}\,h$, we have

$$\begin{aligned} \gamma _i(u)-\phi (u)&\le \tilde{\gamma }_i(u)-\phi (u) =\ell _f\left( u;\tilde{x}_i\right) -f(u)+\frac{m_0}{a_i}\Vert u-\tilde{x}_i\Vert ^2 \le \frac{1}{2}\left( {\bar{m}}+\frac{2m_0}{a_i}\right) \Vert u-\tilde{x}_i\Vert ^2. \end{aligned}$$

(66)

Taking $ u=x^* $, and using (64), (23), (66), (63), Lemma 3.1(c), and the facts that $ x_0=y_0 $, $ {\lambda }_i\le {\lambda }_0 $ and $ \phi (y_i)\ge \phi _* $ for $ i\ge 0 $, we conclude that for every $0 \le i \le k-1$,

$$\begin{aligned}&0.1A_{i+1} \Vert y_{i+1} - {\tilde{x}}_i \Vert ^2 - 2{\lambda }_{i} A_{i}\left( \phi \left( y_{i}\right) - \phi _*\right) - \Vert x^* - x_{i} \Vert ^2 \nonumber \\&\qquad + 2{\lambda }_{i+1}A_{i+1} \left( \phi \left( y_{i+1}\right) - \phi _*\right) + \left( 2m_0{\lambda }_{i+1}+1\right) \Vert x^* - x_{i+1} \Vert ^2 \nonumber \\&\quad \le 2{\lambda }_{i+1} A_i\left( \gamma _i\left( y_i\right) - \phi \left( y_i\right) \right) + 2{\lambda }_{i+1} a_i\left( \gamma _i\left( x^*\right) - \phi _* \right) +2\left( {\lambda }_{i+1}-{\lambda }_{i}\right) A_{i}\left( \phi (y_{i}) - \phi _*\right) \nonumber \\&\quad \le {\lambda }_{i+1}\left( {\bar{m}} + \frac{2m_0}{a_i} \right) \left( A_i\Vert y_i - \tilde{x}_i \Vert ^2 + a_i\Vert x^* - \tilde{x}_i \Vert ^2 \right) \nonumber \\&\quad \le {\lambda }_{i+1}\left( {\bar{m}} + \frac{2m_0}{a_i} \right) \left( \Vert x^* - x_i \Vert ^2 + a_i D_h^2 \right) \nonumber \\&\quad \le 2m_0{\lambda }_{i} \Vert x_i-x^* \Vert ^2 + \left( {\bar{m}} a_i+2m_0\right) {\lambda }_{i+1} D_h^2 \nonumber \\&\quad \le 2m_0{\lambda }_{i} \Vert x_i-x^* \Vert ^2 + \left( {\bar{m}} a_i+2m_0\right) {\lambda }_0 D_h^2. \end{aligned}$$

The conclusion is obtained by rearranging terms and summing the above inequality from $ i=0 $ to $ k-1 $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, J., Monteiro, R.D.C. & Sim, CK. A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems. Comput Optim Appl 79, 649–679 (2021). https://doi.org/10.1007/s10589-021-00280-9

Download citation

Received: 04 November 2019
Accepted: 29 April 2021
Published: 13 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10589-021-00280-9

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

Abstract

Access this article

Similar content being viewed by others

Accelerated inexact composite gradient methods for nonconvex spectral optimization problems

Inexact proximal stochastic gradient method for convex composite optimization

A Proximal Augmented Lagrangian Method for Linearly Constrained Nonconvex Composite Optimization Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary results

Proposition A.1

Proof

Rights and permissions

About this article

Cite this article

Navigation

A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

Abstract

Access this article

Similar content being viewed by others

Accelerated inexact composite gradient methods for nonconvex spectral optimization problems

Inexact proximal stochastic gradient method for convex composite optimization

A Proximal Augmented Lagrangian Method for Linearly Constrained Nonconvex Composite Optimization Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary results

Supplementary results

Proposition A.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation