Skip to main content
Log in

A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this paper, we describe and establish iteration-complexity of two accelerated composite gradient (ACG) variants to solve a smooth nonconvex composite optimization problem whose objective function is the sum of a nonconvex differentiable function f with a Lipschitz continuous gradient and a simple nonsmooth closed convex function h. When f is convex, the first ACG variant reduces to the well-known FISTA for a specific choice of the input, and hence the first one can be viewed as a natural extension of the latter one to the nonconvex setting. The first variant requires an input pair (Mm) such that f is m-weakly convex, \(\nabla f\) is M-Lipschitz continuous, and \(m \le M\) (possibly \(m<M\)), which is usually hard to obtain or poorly estimated. The second variant on the other hand can start from an arbitrary input pair (Mm) of positive scalars and its complexity is shown to be not worse, and better in some cases, than that of the first variant for a large range of the input pairs. Finally, numerical results are provided to illustrate the efficiency of the two ACG variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. http://grouplens.org/datasets/movielens/.

  2. https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

  3. https://www.cc.gatech.edu/~hpark/nmfsoftware.html.

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  2. Becker, S., Fadili, J.: A quasi-newton proximal splitting method. Adv. Neural Inf. Process. Syst. 25, 2618–2626 (2012)

    Google Scholar 

  3. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)

    Article  MathSciNet  Google Scholar 

  4. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  Google Scholar 

  5. Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178, 503–558 (2018)

    Article  MathSciNet  Google Scholar 

  6. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)

    Article  MathSciNet  Google Scholar 

  7. Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)

    Article  MathSciNet  Google Scholar 

  8. Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In international conference on machine learning, pages 37–45. PMLR, (2013)

  9. Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)

    Article  MathSciNet  Google Scholar 

  10. He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25, 2182–2211 (2015)

    Article  MathSciNet  Google Scholar 

  11. He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26, 29–56 (2016)

    Article  MathSciNet  Google Scholar 

  12. Kim, J., Park, H.: Toward faster nonnegative matrix factorization: a new algorithm and comparisons. In 2008 Eighth IEEE International Conference on Data Mining, pages 353–362. IEEE (2008)

  13. Kolossoski, O., Monteiro, R.D.C.: An accelerated non-Euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32, 1244–1272 (2017)

    Article  MathSciNet  Google Scholar 

  14. Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)

    Article  MathSciNet  Google Scholar 

  15. Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with \(\cal{O}(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)

    Article  MathSciNet  Google Scholar 

  16. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 379–387, (2015)

  17. Li, Q., Zhou, Y., Liang, Y., Varshney, P.K. : Convergence analysis of proximal gradient with momentum for nonconvex optimization. In International Conference on Machine Learning, pages 2111–2119. PMLR, (2017)

  18. Liang, J., Monteiro, R.D.C.: A doubly accelerated inexact proximal point method for nonconvex composite optimization problems. Available on arXiv:1811.11378 (2018)

  19. Liang, J., Monteiro, R.D.C.: An average curvature accelerated composite gradient method for nonconvex smooth composite optimization problems. SIAM J. Optim. 31(1), 217–243 (2021)

    Article  MathSciNet  Google Scholar 

  20. Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)

    Article  MathSciNet  Google Scholar 

  21. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O\((1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)

    Google Scholar 

  22. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)

    Article  MathSciNet  Google Scholar 

  23. Nesterov, Y.E.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  24. Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)

    Article  MathSciNet  Google Scholar 

  25. Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst acceleration for gradient-based non-convex optimization. In A. Storkey and F. Perez-Cruz, editors, Proceedings of Machine Learning Research: International Conference on Artificial Intelligence and Statistics, 84, 613–622 (2018)

  26. Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Conv. Anal. 19(4), 1167–1192 (2012)

    MathSciNet  MATH  Google Scholar 

  27. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. http://www.mit.edu/$~$dimitrib/PTseng/papers.html, (2008)

  28. Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Machine Learn. Res. 18, 179 (2017)

    MathSciNet  MATH  Google Scholar 

  29. Yao, Q., Kwok, J. T., Gao, F., Chen, W., Liu, T.-Y.: Efficient inexact proximal gradient algorithm for nonconvex problems. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 3308–3314. IJCAI, (2017)

Download references

Acknowledgements

We are grateful to Guanghui Lan and Saeed Ghadimi for sharing the source code of the UPFAG method in [7]. We are also grateful to the two anonymous referees and the associate editor for their insightful comments which we have used to substantially improve the quality of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaming Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

R. D. C. Monteiro: This work was partially supported by ONR Grant N00014-18-1-2077.

C.-K. Sim: This work is made possible through an LMS Research in Pairs (Scheme 4) grant.

Supplementary results

Supplementary results

This section provides a bound on the quantity \( \min _{0 \le i \le k-1} \Vert y_{i+1}-\tilde{x}_i\Vert ^2 \) for the case in which the parameter \(m_0\) of the ADAP-NC-FISTA satisfies \( m_0\ge {\bar{m}} \). Note that an alternative bound on this quantity has already been developed in Proposition 3.3 for any \(m_0>0\).

Proposition A.1

For every \( k\ge 1 \), for \( m_0\ge {\bar{m}} \), we have

$$\begin{aligned} \frac{1}{10}\left( \sum _{i=0}^{k-1}A_{i+1}\right) \min _{0 \le i \le k-1} \Vert y_{i+1} - {\tilde{x}}_i \Vert ^2\le & {} 2{\lambda }_0 A_0 \left( \phi \left( y_0\right) -\phi _*\right) +\Vert x_0-x^*\Vert ^2\\&+{\lambda }_0D_h^2 \left( 2m_0+2m_0 k+ {\bar{m}}\sum _{i=0}^{k-1}a_i\right) . \end{aligned}$$

Proof

Using the assumption of the lemma that \( m_0\ge {\bar{m}} \), the facts that \(a_i \ge 2\) for \( i\ge 0 \) from the fourth remark following SUB\( (\theta ,{\lambda },m) \), and \( \{{\lambda }_i\} \) is non-increasing from Lemma 3.1(d), we have

$$\begin{aligned} \left( {\bar{m}}+\frac{2m_0}{a_i}\right) {\lambda }_{i+1}\le m_0 \left( 1 +\frac{2}{a_i}\right) {\lambda }_{i+1} \le 2m_0{\lambda }_i. \end{aligned}$$
(63)

The above inequality implies that (34) is always satisfied with \( m=m_0 \) and \( {\lambda }={\lambda }_{i+1} \). Hence, \( m_k \) is never updated in SUB\( (\theta ,{\lambda },m) \), i.e., \( m_i=m_0 \), for \( i\ge 0 \). Using similar arguments as in the proof of Lemma 2.3, we conclude that for every \(i \ge 0\) and \(u \in \Omega \),

$$\begin{aligned}&2{\lambda }_{i+1} A_{i+1} \phi \left( y_{i+1}\right) + \left( 2m_0{\lambda }_{i+1}+1\right) \Vert u - x_{i+1} \Vert ^2 + \left( 1-{\lambda }_{i+1} {{{\mathcal {C}}}}_{i+1}\right) A_{i+1} \Vert y_{i+1} - {\tilde{x}}_i \Vert ^2 \nonumber \\&\quad \le 2{\lambda }_{i+1} A_i \gamma _i\left( y_i\right) + 2{\lambda }_{i+1} a_i \gamma _i(u) + \Vert u - x_i \Vert ^2, \end{aligned}$$
(64)

where

$$\begin{aligned} \gamma _i(u) := \tilde{\gamma }_i\left( y_{i+1}\right) + \frac{1}{{\lambda }_{i+1}}\langle {\tilde{x}}_i - y_{i+1}, u - y_{i+1} \rangle + \frac{m_0}{a_i} \Vert u - y_{i+1} \Vert ^2 \end{aligned}$$

and

$$\begin{aligned} \tilde{\gamma }_i(u) := \ell _f\left( u;{\tilde{x}}_i\right) + h(u) + \frac{m_0}{a_i} \Vert u - {\tilde{x}}_i \Vert ^2. \end{aligned}$$
(65)

As in Lemma 2.2(a), we have \( \gamma _i(u)\le \tilde{\gamma }_i(u) \) for every \( u\in \mathrm {dom}\,h \). Hence, it follows from (65) and (4) that for every \( k\ge 0 \) and \( u \in \mathrm {dom}\,h\), we have

$$\begin{aligned} \gamma _i(u)-\phi (u)&\le \tilde{\gamma }_i(u)-\phi (u) =\ell _f\left( u;\tilde{x}_i\right) -f(u)+\frac{m_0}{a_i}\Vert u-\tilde{x}_i\Vert ^2 \le \frac{1}{2}\left( {\bar{m}}+\frac{2m_0}{a_i}\right) \Vert u-\tilde{x}_i\Vert ^2. \end{aligned}$$
(66)

Taking \( u=x^* \), and using (64), (23), (66), (63), Lemma 3.1(c), and the facts that \( x_0=y_0 \), \( {\lambda }_i\le {\lambda }_0 \) and \( \phi (y_i)\ge \phi _* \) for \( i\ge 0 \), we conclude that for every \(0 \le i \le k-1\),

$$\begin{aligned}&0.1A_{i+1} \Vert y_{i+1} - {\tilde{x}}_i \Vert ^2 - 2{\lambda }_{i} A_{i}\left( \phi \left( y_{i}\right) - \phi _*\right) - \Vert x^* - x_{i} \Vert ^2 \nonumber \\&\qquad + 2{\lambda }_{i+1}A_{i+1} \left( \phi \left( y_{i+1}\right) - \phi _*\right) + \left( 2m_0{\lambda }_{i+1}+1\right) \Vert x^* - x_{i+1} \Vert ^2 \nonumber \\&\quad \le 2{\lambda }_{i+1} A_i\left( \gamma _i\left( y_i\right) - \phi \left( y_i\right) \right) + 2{\lambda }_{i+1} a_i\left( \gamma _i\left( x^*\right) - \phi _* \right) +2\left( {\lambda }_{i+1}-{\lambda }_{i}\right) A_{i}\left( \phi (y_{i}) - \phi _*\right) \nonumber \\&\quad \le {\lambda }_{i+1}\left( {\bar{m}} + \frac{2m_0}{a_i} \right) \left( A_i\Vert y_i - \tilde{x}_i \Vert ^2 + a_i\Vert x^* - \tilde{x}_i \Vert ^2 \right) \nonumber \\&\quad \le {\lambda }_{i+1}\left( {\bar{m}} + \frac{2m_0}{a_i} \right) \left( \Vert x^* - x_i \Vert ^2 + a_i D_h^2 \right) \nonumber \\&\quad \le 2m_0{\lambda }_{i} \Vert x_i-x^* \Vert ^2 + \left( {\bar{m}} a_i+2m_0\right) {\lambda }_{i+1} D_h^2 \nonumber \\&\quad \le 2m_0{\lambda }_{i} \Vert x_i-x^* \Vert ^2 + \left( {\bar{m}} a_i+2m_0\right) {\lambda }_0 D_h^2. \end{aligned}$$

The conclusion is obtained by rearranging terms and summing the above inequality from \( i=0 \) to \( k-1 \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, J., Monteiro, R.D.C. & Sim, CK. A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems. Comput Optim Appl 79, 649–679 (2021). https://doi.org/10.1007/s10589-021-00280-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-021-00280-9

Navigation