Skip to main content
Log in

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

  • Published:
Set-Valued and Variational Analysis Aims and scope Submit manuscript

Abstract

We study linear convergence of some first-order methods such as the proximal gradient method (PGM), the proximal alternating linearized minimization (PALM) algorithm and the randomized block coordinate proximal gradient method (R-BCPGM) for minimizing the sum of a smooth convex function and a nonsmooth convex function. We introduce a new analytic framework based on the error bound/calmness/metric subregularity/bounded metric subregularity. This variational analysis perspective enables us to provide some concrete sufficient conditions for linear convergence and applicable approaches for calculating linear convergence rates of these first-order methods for a class of structured convex problems. In particular, for the LASSO, the fused LASSO and the group LASSO, these conditions are satisfied automatically, and the modulus for the calmness/metric subregularity is computable. Consequently, the linear convergence of the first-order methods for these important applications is automatically guaranteed and the convergence rates can be calculated. The new perspective enables us to improve some existing results and obtain novel results unknown in the literature. Particularly, we improve the result on the linear convergence of the PGM and PALM for the structured convex problem with a computable error bound estimation. Also for the R-BCPGM for the structured convex problem, we prove that the linear convergence is ensured when the nonsmooth part of the objective function is the group LASSO regularizer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, A., Negahban, S.N., Wainwright, M.J.: Fast global convergence of gradient methods for high-dimensional statistical recovery, vol. 40 (2012)

  2. Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15, 365–380 (2008)

    MathSciNet  MATH  Google Scholar 

  3. Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15, 35–47 (2014)

    MathSciNet  MATH  Google Scholar 

  4. Aubin, J.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9, 87–111 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends R Mach. Learn. 4(1), 1–106 (2012)

    MATH  Google Scholar 

  6. Beck, A.: First-order methods in optimization, vol. 25. SIAM, New Delhi (2017)

    Book  MATH  Google Scholar 

  7. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bishop, C.M.: Pattern recognition and machine learning. Springer-Verlag, New York (2006)

    MATH  Google Scholar 

  9. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics 64, 115–123 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  12. Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted 1 minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dontchev, A.L., Rockafellar, R.T.: Regularity and conditioning of solution mappings in variational analysis. Set-Valued Anal. 12, 79–109 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  14. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43, 919–948 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, Berlin (2007)

    MATH  Google Scholar 

  16. Fercoq, O., Richtrik, P.: Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent. SIAM Rev. 58, 739–771 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv:1606.00269 (2010)

  18. Gfrerer, H.: On directional metric regularity, subregularity and optimality conditions for nonsmooth mathematical programs. Set-Valued Variat. Anal. 21, 151–176 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Güler, O., Hoffman, A.J., Rothblum, U.G.: Approximations to solutions to systems of linear inequalities. SIAM J. Matrix Anal. Appl. 16, 688–696 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  20. Guo, L., Ye, J.J., Zhang, J.: Mathematical programs with geometric constraints in Banach spaces: enhanced optimality, exact penalty, and sensitivity. SIAM J. Optim. 23, 2295–2319 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Gfrerer, H., Ye, J.J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27, 842–865 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. Henrion, R., Jourani, A., Outrata, J.: On the calmness of a class of multifunctions. SIAM J. Optim. 13, 603–618 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  23. Henrion, R., Outrata, J.: Calmness of constraint systems with applications. Math. Program. 104, 437–464 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Research Nat. Bur. Standards 49, 263–265 (1952)

    Article  MathSciNet  Google Scholar 

  25. Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. 163, 85–114 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  26. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp 795–811. Springer (2016)

  27. Klatte, D., Kummer, B.: Constrained minima and lipschitzian penalties in metric spaces. SIAM J. Optim. 13, 619–633 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  28. Klatte, D., Thiere, G.: Error bounds for solutions of linear equations and inequalities. Zeitschrift fü,r Oper. Res. 41, 191–214 (1995)

    MathSciNet  MATH  Google Scholar 

  29. Li, G., Pong, T.K.: Calculus of the exponent of kurdykaŁojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  30. Li, X., Zhao, T., Arora, R., Liu, H., Hong, M.: An improved convergence analysis of cyclic block coordinate descent-type methods for strongly convex minimization. Artif. Intell. Stat., 491–499 (2016)

  31. Liu, Y.L., Bi, S.J., Pan, S.H.: Several classes of stationary points for rank regularized minimization problems. SIAM J. Optim. 30(2), 1756–1775 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  32. Luke, D.R., Nguyen, H.T., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings. Math. Oper. Res. 43, 1143–1176 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  33. Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control. Optim. 30, 408–425 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  34. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  35. Martinet, B.: Brève communication régularisation d’inéquations variationnelles par approximations successives. Revue française d’informatique et de Recherche Opérationnelle, Sé,rie Rouge 4, 154–158 (1970)

    MATH  Google Scholar 

  36. Mordukhovich, B.: Variational analysis and generalized differentiation i: basic theory, II: applications. Springer Science & Business Media, Berlin (2006)

    Google Scholar 

  37. Necoara, I., Clipici, D.: Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC, vol. 23 (2013)

  38. Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: Convergence analysis and error bounds. SIAM J. Optim. 26, 197–226 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175, 69–107 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  40. Necoara, I., Nesterov, Y., Glineur, F.: Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 173, 227–254 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  41. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  42. Nesterov, Y.: Introductory lectures on convex optimization. Kluwer Academic, Dordrecht (2004)

    Book  MATH  Google Scholar 

  43. O’donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  44. Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  45. Peña, J., Vera, J.C., Zuluaga, L.F.: An algorithm to compute the Hoffman constant of a system of linear constraints. arXiv:1804.08418 (2018)

  46. Peña, J., Vera, J.C., Zuluaga, L.F.: New characterizations of Hoffman constants for systems of linear constraints. Math. Prog. (2020)

  47. Polyak, B.T.: Introduction to optimization, optimization software incorporation. Publications Division, New York (1987)

    Google Scholar 

  48. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144, 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  49. Robinson, S.M.: Stability theory for systems of inequalities. Part i: Linear systems. SIAM J. Numer. Anal. 12, 754–769 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  50. Robinson, S.M.: An implicit-function theorem for generalized variational inequalities Technical report (WISCONSIN UNIV MADISON MATHEMATICS RESEARCH CEN- TER, 1976)

  51. Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Study 14, 206–214 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  52. Rockafellar, R.T.: Convex analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  53. Rockafellar, R.T., Wets, R.: Variational analysis. Springer Science & Business Media, Berlin (2009)

    MATH  Google Scholar 

  54. Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4, 27–46 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  55. Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inf. Process. Sys. 24, 1458–1466 (2011)

    Google Scholar 

  56. Stoer, J., Witzgall, C.: Convexity and optimization in finite dimensions I. Springer Science & Business Media, Berlin (2012)

    MATH  Google Scholar 

  57. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Series B (Methodological) 73, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  58. Tibshirani, R, Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Royal Stat. Soc. Ser. B (Statistical Methodology) 67, 91–108 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  59. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125, 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  60. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  61. Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)

    MathSciNet  MATH  Google Scholar 

  62. Wang, X., Ye, J.J., Yuan, X., Zeng, S., Zhang, J.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis. Set-Valued Variat. Anal. (2021)

  63. Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23, 1062–1091 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  64. Ye, J.J., Ye, X.Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22, 977–997 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  65. Ye, J.J., Zhou, J.C.: Verifiable sufficient conditions for the error bound property of second-order cone complementarity problems. Math. Program. 171, 361–395 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  66. Yuan, M., Lin, Y.: Model selection and estimation in regression with group variables. J. Royal Stat. Soc. Series B (Statistical Methodology) 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  67. Yuan, X., Zeng, S., Zhang, J.: Discerning the linear convergence of ADMM for structured convex optimization through the lens of variational analysis. J. Mach. Learn. Res. 21, 1–75 (2020)

    MathSciNet  MATH  Google Scholar 

  68. Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. Math. Program. 180(1), 371–416 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  69. Zhang, H., Jiang, J., Luo, Z.Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1, 163–186 (2013)

    Article  MATH  Google Scholar 

  70. Zhang, S.: Global error bounds for convex conic problems. SIAM J. Optim. 10, 836–851 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  71. Zheng, X.Y., Ng, K.F.: Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM J. Optim. 24, 154–174 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  72. Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  73. Zhou, Z., Zhang, Q., So, A.M.-C.: L1,p-norm regularization: error bounds and convergence rate analysis of first-order methods. In: International conference on machine learning, pp 1501–1510 (2015)

Download references

Acknowledgments

We are grateful to two anonymous referees for their suggestions and comments which have helped us improve the paper substantially.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoming Yuan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was partially supported by NSERC, the general research fund from the Research Grants Council of Hong Kong 12302318, National Science Foundation of China 11971220, Natural Science Foundation of Guangdong Province 2019A1515011152

Appendix

Appendix

Proof of Theorem 1

Since \(L_{i_k}\) is the Lipschitz constant of \(\nabla _{i_k} f\), \(c^k_i \ge L_i\) and \(x^{k+1}_j = x^k_j, \forall j\neq i_k\), we have

$$ f(x^{k+1}) \le f(x^{k}) + \left\langle \nabla_{i_{k}} f(x^{k}), x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2} \| x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \|^{2}, $$

which implies that

$$ F(x^{k+1}) \le F(x^{k}) + \left\langle \nabla_{i_{k}} f(x^{k}), x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2} \| x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \|^{2} + g_{i_{k}}(x_{i_{k}}^{k+1}) - g_{i_{k}}(x_{i_{k}}^{k}). $$

Combining with the iteration scheme (3), we have

$$ F(x^{k+1}) - F(x^{k}) \le \min_{t_{i_{k}}}\left\{ \left\langle \nabla_{i_{k}} f(x^{k}), t_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2}t_{i_{k}}^{2} + g_{i_{k}}(x_{i_{k}}^{k}+t_{i_{k}}) - g_{i_{k}}(x_{i_{k}}^{k}) \right\}, $$

where \(t_{i_k}:=x_{i_k}-x^k_{i_k}\). Recall that for a given iteration point xk, the next iteration point xk+ 1 is obtained by using the scheme (3) where the index ik is randomly chosen from \(\{1,\dots , N\}\) with the uniform probability distribution. Conditioned on xk and taking expectation with respect to the random index ik, we obtain

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{E}[F(x^{k+1}) - F(x^{k}) ~|~ x^{k} ] } \\ && \le \mathbb{E}\left\{\min_{t_{i_{k}}}\left\langle \nabla_{i_{k}} f(x^{k}), t_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2}t_{i_{k}}^{2} + g_{i_{k}}(x_{i_{k}}^{k}+t_{i_{k}}) - g_{i_{k}}(x_{i_{k}}^{k}) ~\Big|~ x^{k} \right\}. \end{array} $$
(29)

Now, we are going to estimate the right hand side in the above inequality

$$ \begin{array}{@{}rcl@{}} && ~~~~ \mathbb{E}\left\{\min_{t_{i_{k}}}\left\langle \nabla_{i_{k}} f(x^{k}), t_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2}t_{i_{k}}^{2} + g_{i_{k}}(x_{i_{k}}^{k}+t_{i_{k}}) - g_{i_{k}}(x_{i_{k}}^{k}) ~\Big|~ x^{k} \right\}\\ &=& \frac{1}{N}{\sum}_{i = 1}^{N} \left \{ \min_{t_{i}}\left\langle \nabla_{i} f(x^{k}), t_{i} \right\rangle + \frac{{c_{i}^{k}}}{2}{t_{i}^{2}} + g_{i}({x_{i}^{k}}+t_{i}) - g_{i}({x_{i}^{k}})\right \} \\ &\le& \frac{1}{N} \min_{t}{\sum}_{i = 1}^{N} \left\{\left\langle \nabla_{i} f(x^{k}), t_{i} \right\rangle + \frac{C}{2}{t_{i}^{2}} + g_{i}({x_{i}^{k}}+t_{i}) - g_{i}({x_{i}^{k}}) \right\}\\ &=& \frac{1}{N} \min_{y}\left\{ \left\langle \nabla f(x^{k}), y-x^{k} \right\rangle + \frac{C}{2}\|y-x^{k}\|^{2} + g(y) - g(x^{k})\right\} \\ & =& \frac{1}{N} (F_{C}(x^{k}) - F(x^{k})), \end{array} $$
(30)

where \(t\!:=\!(t_1,\dots , t_N)\) and \(F_{C}(x) := \min \limits _{y}\left \{ f(x) + \left \langle \nabla f(x), y - x \right \rangle + \frac {C}{2}\|y - x\|^2 + g(y) \right \} \).

Furthermore, we set

$$ \begin{array}{@{}rcl@{}} \hat{x}^{k} &:=& (I + \frac{1}{C} \partial g)^{-1} \left( x^{k} - \frac{1}{C} \nabla f(x^{k}) \right) \\ &=& \arg\min_{y}\left\{ \left\langle \nabla f(x^{k}), y-x^{k} \right\rangle + \frac{C}{2}\|y-x^{k}\|^{2} + g(y)\right\}. \end{array} $$

It follows immediately that

$$ g \left( x^{k} \right) \ge g \left( \hat{x}_{k} \right) - \left\langle \nabla f(x^{k}) + C(\hat{x}_{k} - x^{k}), x^{k} - \hat{x}_{k} \right\rangle, $$

which yields that,

$$ f \left( x^{k} \right) + g \left( x^{k} \right) \ge f \left( x^{k} \right) + \left\langle \nabla f(x^{k}), \hat{x}_{k} - x^{k} \right\rangle + \frac{C}{2}\| \hat{x}_{k} - x^{k} \|^{2} + g \left( \hat{x}_{k} \right) + \frac{C}{2}\| \hat{x}_{k} - x^{k} \|^{2}, $$

and hence

$$ F(x^{k}) \ge F_{C}(x^{k}) + \frac{C}{2}\| \hat{x}_{k} - x^{k} \|^{2}. $$
(31)

Let \(\tilde {x}:= \text {Proj}_{\mathcal {X}}(x)\) for any x and thus \(f(\tilde {x}) + g(\tilde {x}) = F^{*}\). Then we have

$$ \begin{array}{@{}rcl@{}} F_{C}(x) - F^{*} &=& \min_{y}\left\{ f(x) + \left\langle \nabla f(x), y-x \right\rangle + \frac{C}{2}\|y-x\|^{2} + g(y) \right\} - f(\tilde{x}) - g(\tilde{x}) \\ &\le& f(x) - f(\tilde{x}) + \left\langle \nabla f(x), \tilde{x}-x \right\rangle + \frac{C}{2}\|\tilde{x}-x\|^{2} \\ &\le& \frac{L+C}{2}\|\tilde{x}-x\|^{2} = \frac{L+C}{2}\text{dist}\left( x, {\mathcal{X}}\right)^{2}, \end{array} $$

where L is the Lipschitz constant of ∇f. Plugging x = xk into the above inequalities, we have

$$ \begin{array}{@{}rcl@{}} F_{C}(x^{k}) - F^{*} &\le& \frac{L+C}{2}\text{dist}\left( x^{k}, {\mathcal{X}}\right)^{2} \\ &\le& \frac{\kappa(L+C)}{2} \|\hat{x}_{k}-x^{k}\|^{2} \\ &\le& \kappa(1+L/C)(F(x^{k}) - F_{C}(x^{k})), \end{array} $$

where the second inequality follows from (14) and the third inequality is a direct consequence of (31). Then we have

$$ \begin{array}{@{}rcl@{}} F(x^{k}) - F^{*} &=& F(x^{k}) - F_{C}(x^{k}) + F_{C}(x^{k}) - F^{*} \\ & \le& (1+\kappa(1+L/C))(F(x^{k}) - F_{C}(x^{k})). \end{array} $$
(32)

By (29), (30) and (32), we have

$$ \mathbb{E}[F(x^{k+1}) - F(x^{k}) ~|~ x^{k}] \leq \frac{1}{N} (F_{C}(x^{k}) - F(x^{k})) \le \frac{1}{N} \cdot \frac{1}{1+\kappa(1+L/C)}(F^{*} - F(x^{k})), $$

therefore

$$ \mathbb{E}[F(x^{k+1}) - F^{*} ~|~ x^{k}] \le \Big(1 - \frac{1}{N(1+\kappa(1+L/C))} \Big) (F(x^{k}) - F^{*}). $$

For any l ≥ 1, combining the above inequality over k = 0, 1, … , l − 1, taking expectation with all the history, we obtain

$$ \mathbb{E}[F(x^{l})-F^{*}] \le \sigma^{l} (F(x^{0})-F^{*}), $$

where \(\sigma = \Big (1 - \frac {1}{N(1+\kappa (1+L/C))} \Big ) \in (0,1)\), and hence the R-BCPGM achieves a linear convergence rate in terms of the expected objective value. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, J.J., Yuan, X., Zeng, S. et al. Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems. Set-Valued Var. Anal 29, 803–837 (2021). https://doi.org/10.1007/s11228-021-00591-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11228-021-00591-3

Keywords

Mathematics Subject Classification 2010

Navigation