Abstract
In this paper, we describe and establish iteration-complexity of two accelerated composite gradient (ACG) variants to solve a smooth nonconvex composite optimization problem whose objective function is the sum of a nonconvex differentiable function f with a Lipschitz continuous gradient and a simple nonsmooth closed convex function h. When f is convex, the first ACG variant reduces to the well-known FISTA for a specific choice of the input, and hence the first one can be viewed as a natural extension of the latter one to the nonconvex setting. The first variant requires an input pair (M, m) such that f is m-weakly convex, \(\nabla f\) is M-Lipschitz continuous, and \(m \le M\) (possibly \(m<M\)), which is usually hard to obtain or poorly estimated. The second variant on the other hand can start from an arbitrary input pair (M, m) of positive scalars and its complexity is shown to be not worse, and better in some cases, than that of the first variant for a large range of the input pairs. Finally, numerical results are provided to illustrate the efficiency of the two ACG variants.
Similar content being viewed by others
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Becker, S., Fadili, J.: A quasi-newton proximal splitting method. Adv. Neural Inf. Process. Syst. 25, 2618–2626 (2012)
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178, 503–558 (2018)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)
Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In international conference on machine learning, pages 37–45. PMLR, (2013)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25, 2182–2211 (2015)
He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26, 29–56 (2016)
Kim, J., Park, H.: Toward faster nonnegative matrix factorization: a new algorithm and comparisons. In 2008 Eighth IEEE International Conference on Data Mining, pages 353–362. IEEE (2008)
Kolossoski, O., Monteiro, R.D.C.: An accelerated non-Euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32, 1244–1272 (2017)
Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with \(\cal{O}(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 379–387, (2015)
Li, Q., Zhou, Y., Liang, Y., Varshney, P.K. : Convergence analysis of proximal gradient with momentum for nonconvex optimization. In International Conference on Machine Learning, pages 2111–2119. PMLR, (2017)
Liang, J., Monteiro, R.D.C.: A doubly accelerated inexact proximal point method for nonconvex composite optimization problems. Available on arXiv:1811.11378 (2018)
Liang, J., Monteiro, R.D.C.: An average curvature accelerated composite gradient method for nonconvex smooth composite optimization problems. SIAM J. Optim. 31(1), 217–243 (2021)
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O\((1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)
Nesterov, Y.E.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst acceleration for gradient-based non-convex optimization. In A. Storkey and F. Perez-Cruz, editors, Proceedings of Machine Learning Research: International Conference on Artificial Intelligence and Statistics, 84, 613–622 (2018)
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Conv. Anal. 19(4), 1167–1192 (2012)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. http://www.mit.edu/$~$dimitrib/PTseng/papers.html, (2008)
Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Machine Learn. Res. 18, 179 (2017)
Yao, Q., Kwok, J. T., Gao, F., Chen, W., Liu, T.-Y.: Efficient inexact proximal gradient algorithm for nonconvex problems. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 3308–3314. IJCAI, (2017)
Acknowledgements
We are grateful to Guanghui Lan and Saeed Ghadimi for sharing the source code of the UPFAG method in [7]. We are also grateful to the two anonymous referees and the associate editor for their insightful comments which we have used to substantially improve the quality of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
R. D. C. Monteiro: This work was partially supported by ONR Grant N00014-18-1-2077.
C.-K. Sim: This work is made possible through an LMS Research in Pairs (Scheme 4) grant.
Supplementary results
Supplementary results
This section provides a bound on the quantity \( \min _{0 \le i \le k-1} \Vert y_{i+1}-\tilde{x}_i\Vert ^2 \) for the case in which the parameter \(m_0\) of the ADAP-NC-FISTA satisfies \( m_0\ge {\bar{m}} \). Note that an alternative bound on this quantity has already been developed in Proposition 3.3 for any \(m_0>0\).
Proposition A.1
For every \( k\ge 1 \), for \( m_0\ge {\bar{m}} \), we have
Proof
Using the assumption of the lemma that \( m_0\ge {\bar{m}} \), the facts that \(a_i \ge 2\) for \( i\ge 0 \) from the fourth remark following SUB\( (\theta ,{\lambda },m) \), and \( \{{\lambda }_i\} \) is non-increasing from Lemma 3.1(d), we have
The above inequality implies that (34) is always satisfied with \( m=m_0 \) and \( {\lambda }={\lambda }_{i+1} \). Hence, \( m_k \) is never updated in SUB\( (\theta ,{\lambda },m) \), i.e., \( m_i=m_0 \), for \( i\ge 0 \). Using similar arguments as in the proof of Lemma 2.3, we conclude that for every \(i \ge 0\) and \(u \in \Omega \),
where
and
As in Lemma 2.2(a), we have \( \gamma _i(u)\le \tilde{\gamma }_i(u) \) for every \( u\in \mathrm {dom}\,h \). Hence, it follows from (65) and (4) that for every \( k\ge 0 \) and \( u \in \mathrm {dom}\,h\), we have
Taking \( u=x^* \), and using (64), (23), (66), (63), Lemma 3.1(c), and the facts that \( x_0=y_0 \), \( {\lambda }_i\le {\lambda }_0 \) and \( \phi (y_i)\ge \phi _* \) for \( i\ge 0 \), we conclude that for every \(0 \le i \le k-1\),
The conclusion is obtained by rearranging terms and summing the above inequality from \( i=0 \) to \( k-1 \). \(\square \)
Rights and permissions
About this article
Cite this article
Liang, J., Monteiro, R.D.C. & Sim, CK. A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems. Comput Optim Appl 79, 649–679 (2021). https://doi.org/10.1007/s10589-021-00280-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-021-00280-9