Abstract
We study linear convergence of some first-order methods such as the proximal gradient method (PGM), the proximal alternating linearized minimization (PALM) algorithm and the randomized block coordinate proximal gradient method (R-BCPGM) for minimizing the sum of a smooth convex function and a nonsmooth convex function. We introduce a new analytic framework based on the error bound/calmness/metric subregularity/bounded metric subregularity. This variational analysis perspective enables us to provide some concrete sufficient conditions for linear convergence and applicable approaches for calculating linear convergence rates of these first-order methods for a class of structured convex problems. In particular, for the LASSO, the fused LASSO and the group LASSO, these conditions are satisfied automatically, and the modulus for the calmness/metric subregularity is computable. Consequently, the linear convergence of the first-order methods for these important applications is automatically guaranteed and the convergence rates can be calculated. The new perspective enables us to improve some existing results and obtain novel results unknown in the literature. Particularly, we improve the result on the linear convergence of the PGM and PALM for the structured convex problem with a computable error bound estimation. Also for the R-BCPGM for the structured convex problem, we prove that the linear convergence is ensured when the nonsmooth part of the objective function is the group LASSO regularizer.
Similar content being viewed by others
References
Agarwal, A., Negahban, S.N., Wainwright, M.J.: Fast global convergence of gradient methods for high-dimensional statistical recovery, vol. 40 (2012)
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15, 365–380 (2008)
Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15, 35–47 (2014)
Aubin, J.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9, 87–111 (1984)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends R Mach. Learn. 4(1), 1–106 (2012)
Beck, A.: First-order methods in optimization, vol. 25. SIAM, New Delhi (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Bishop, C.M.: Pattern recognition and machine learning. Springer-Verlag, New York (2006)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics 64, 115–123 (2008)
Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted ℓ1 minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)
Dontchev, A.L., Rockafellar, R.T.: Regularity and conditioning of solution mappings in variational analysis. Set-Valued Anal. 12, 79–109 (2004)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43, 919–948 (2018)
Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, Berlin (2007)
Fercoq, O., Richtrik, P.: Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent. SIAM Rev. 58, 739–771 (2016)
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv:1606.00269 (2010)
Gfrerer, H.: On directional metric regularity, subregularity and optimality conditions for nonsmooth mathematical programs. Set-Valued Variat. Anal. 21, 151–176 (2013)
Güler, O., Hoffman, A.J., Rothblum, U.G.: Approximations to solutions to systems of linear inequalities. SIAM J. Matrix Anal. Appl. 16, 688–696 (1995)
Guo, L., Ye, J.J., Zhang, J.: Mathematical programs with geometric constraints in Banach spaces: enhanced optimality, exact penalty, and sensitivity. SIAM J. Optim. 23, 2295–2319 (2013)
Gfrerer, H., Ye, J.J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27, 842–865 (2017)
Henrion, R., Jourani, A., Outrata, J.: On the calmness of a class of multifunctions. SIAM J. Optim. 13, 603–618 (2002)
Henrion, R., Outrata, J.: Calmness of constraint systems with applications. Math. Program. 104, 437–464 (2005)
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Research Nat. Bur. Standards 49, 263–265 (1952)
Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. 163, 85–114 (2017)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp 795–811. Springer (2016)
Klatte, D., Kummer, B.: Constrained minima and lipschitzian penalties in metric spaces. SIAM J. Optim. 13, 619–633 (2002)
Klatte, D., Thiere, G.: Error bounds for solutions of linear equations and inequalities. Zeitschrift fü,r Oper. Res. 41, 191–214 (1995)
Li, G., Pong, T.K.: Calculus of the exponent of kurdykaŁojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Li, X., Zhao, T., Arora, R., Liu, H., Hong, M.: An improved convergence analysis of cyclic block coordinate descent-type methods for strongly convex minimization. Artif. Intell. Stat., 491–499 (2016)
Liu, Y.L., Bi, S.J., Pan, S.H.: Several classes of stationary points for rank regularized minimization problems. SIAM J. Optim. 30(2), 1756–1775 (2020)
Luke, D.R., Nguyen, H.T., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings. Math. Oper. Res. 43, 1143–1176 (2018)
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control. Optim. 30, 408–425 (1992)
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Martinet, B.: Brève communication régularisation d’inéquations variationnelles par approximations successives. Revue française d’informatique et de Recherche Opérationnelle, Sé,rie Rouge 4, 154–158 (1970)
Mordukhovich, B.: Variational analysis and generalized differentiation i: basic theory, II: applications. Springer Science & Business Media, Berlin (2006)
Necoara, I., Clipici, D.: Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC, vol. 23 (2013)
Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: Convergence analysis and error bounds. SIAM J. Optim. 26, 197–226 (2016)
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175, 69–107 (2019)
Necoara, I., Nesterov, Y., Glineur, F.: Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 173, 227–254 (2017)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Nesterov, Y.: Introductory lectures on convex optimization. Kluwer Academic, Dordrecht (2004)
O’donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Peña, J., Vera, J.C., Zuluaga, L.F.: An algorithm to compute the Hoffman constant of a system of linear constraints. arXiv:1804.08418 (2018)
Peña, J., Vera, J.C., Zuluaga, L.F.: New characterizations of Hoffman constants for systems of linear constraints. Math. Prog. (2020)
Polyak, B.T.: Introduction to optimization, optimization software incorporation. Publications Division, New York (1987)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144, 1–38 (2014)
Robinson, S.M.: Stability theory for systems of inequalities. Part i: Linear systems. SIAM J. Numer. Anal. 12, 754–769 (1975)
Robinson, S.M.: An implicit-function theorem for generalized variational inequalities Technical report (WISCONSIN UNIV MADISON MATHEMATICS RESEARCH CEN- TER, 1976)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Study 14, 206–214 (1981)
Rockafellar, R.T.: Convex analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.: Variational analysis. Springer Science & Business Media, Berlin (2009)
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4, 27–46 (2016)
Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inf. Process. Sys. 24, 1458–1466 (2011)
Stoer, J., Witzgall, C.: Convexity and optimization in finite dimensions I. Springer Science & Business Media, Berlin (2012)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Series B (Methodological) 73, 267–288 (1996)
Tibshirani, R, Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Royal Stat. Soc. Ser. B (Statistical Methodology) 67, 91–108 (2005)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125, 263–295 (2010)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)
Wang, X., Ye, J.J., Yuan, X., Zeng, S., Zhang, J.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis. Set-Valued Variat. Anal. (2021)
Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23, 1062–1091 (2013)
Ye, J.J., Ye, X.Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22, 977–997 (1997)
Ye, J.J., Zhou, J.C.: Verifiable sufficient conditions for the error bound property of second-order cone complementarity problems. Math. Program. 171, 361–395 (2018)
Yuan, M., Lin, Y.: Model selection and estimation in regression with group variables. J. Royal Stat. Soc. Series B (Statistical Methodology) 68, 49–67 (2006)
Yuan, X., Zeng, S., Zhang, J.: Discerning the linear convergence of ADMM for structured convex optimization through the lens of variational analysis. J. Mach. Learn. Res. 21, 1–75 (2020)
Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. Math. Program. 180(1), 371–416 (2020)
Zhang, H., Jiang, J., Luo, Z.Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1, 163–186 (2013)
Zhang, S.: Global error bounds for convex conic problems. SIAM J. Optim. 10, 836–851 (2000)
Zheng, X.Y., Ng, K.F.: Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM J. Optim. 24, 154–174 (2014)
Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)
Zhou, Z., Zhang, Q., So, A.M.-C.: L1,p-norm regularization: error bounds and convergence rate analysis of first-order methods. In: International conference on machine learning, pp 1501–1510 (2015)
Acknowledgments
We are grateful to two anonymous referees for their suggestions and comments which have helped us improve the paper substantially.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research was partially supported by NSERC, the general research fund from the Research Grants Council of Hong Kong 12302318, National Science Foundation of China 11971220, Natural Science Foundation of Guangdong Province 2019A1515011152
Appendix
Appendix
Proof of Theorem 1
Since \(L_{i_k}\) is the Lipschitz constant of \(\nabla _{i_k} f\), \(c^k_i \ge L_i\) and \(x^{k+1}_j = x^k_j, \forall j\neq i_k\), we have
which implies that
Combining with the iteration scheme (3), we have
where \(t_{i_k}:=x_{i_k}-x^k_{i_k}\). Recall that for a given iteration point xk, the next iteration point xk+ 1 is obtained by using the scheme (3) where the index ik is randomly chosen from \(\{1,\dots , N\}\) with the uniform probability distribution. Conditioned on xk and taking expectation with respect to the random index ik, we obtain
Now, we are going to estimate the right hand side in the above inequality
where \(t\!:=\!(t_1,\dots , t_N)\) and \(F_{C}(x) := \min \limits _{y}\left \{ f(x) + \left \langle \nabla f(x), y - x \right \rangle + \frac {C}{2}\|y - x\|^2 + g(y) \right \} \).
Furthermore, we set
It follows immediately that
which yields that,
and hence
Let \(\tilde {x}:= \text {Proj}_{\mathcal {X}}(x)\) for any x and thus \(f(\tilde {x}) + g(\tilde {x}) = F^{*}\). Then we have
where L is the Lipschitz constant of ∇f. Plugging x = xk into the above inequalities, we have
where the second inequality follows from (14) and the third inequality is a direct consequence of (31). Then we have
By (29), (30) and (32), we have
therefore
For any l ≥ 1, combining the above inequality over k = 0, 1, … , l − 1, taking expectation with all the history, we obtain
where \(\sigma = \Big (1 - \frac {1}{N(1+\kappa (1+L/C))} \Big ) \in (0,1)\), and hence the R-BCPGM achieves a linear convergence rate in terms of the expected objective value. □
Rights and permissions
About this article
Cite this article
Ye, J.J., Yuan, X., Zeng, S. et al. Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems. Set-Valued Var. Anal 29, 803–837 (2021). https://doi.org/10.1007/s11228-021-00591-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11228-021-00591-3
Keywords
- Metric subregularity
- Calmness
- Proximal gradient method
- Proximal alternating linearized minimization
- Randomized block coordinate proximal gradient method
- Linear convergence
- Variational analysis
- Machine learning
- Statistics