Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Ye, Jane J.; Yuan, Xiaoming; Zeng, Shangzhi; Zhang, Jin

doi:10.1007/s11228-021-00591-3

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Published: 29 June 2021

Volume 29, pages 803–837, (2021)
Cite this article

Set-Valued and Variational Analysis Aims and scope Submit manuscript

Jane J. Ye¹,
Xiaoming Yuan²,
Shangzhi Zeng² &
…
Jin Zhang³

426 Accesses
13 Citations
Explore all metrics

Abstract

We study linear convergence of some first-order methods such as the proximal gradient method (PGM), the proximal alternating linearized minimization (PALM) algorithm and the randomized block coordinate proximal gradient method (R-BCPGM) for minimizing the sum of a smooth convex function and a nonsmooth convex function. We introduce a new analytic framework based on the error bound/calmness/metric subregularity/bounded metric subregularity. This variational analysis perspective enables us to provide some concrete sufficient conditions for linear convergence and applicable approaches for calculating linear convergence rates of these first-order methods for a class of structured convex problems. In particular, for the LASSO, the fused LASSO and the group LASSO, these conditions are satisfied automatically, and the modulus for the calmness/metric subregularity is computable. Consequently, the linear convergence of the first-order methods for these important applications is automatically guaranteed and the convergence rates can be calculated. The new perspective enables us to improve some existing results and obtain novel results unknown in the literature. Particularly, we improve the result on the linear convergence of the PGM and PALM for the structured convex problem with a computable error bound estimation. Also for the R-BCPGM for the structured convex problem, we prove that the linear convergence is ensured when the nonsmooth part of the objective function is the group LASSO regularizer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear Convergence of Prox-SVRG Method for Separable Non-smooth Convex Optimization Problems under Bounded Metric Subregularity

Article 06 January 2022

On the Linear Convergence of the Approximate Proximal Splitting Method for Non-smooth Convex Optimization

Article 18 June 2014

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

References

Agarwal, A., Negahban, S.N., Wainwright, M.J.: Fast global convergence of gradient methods for high-dimensional statistical recovery, vol. 40 (2012)
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15, 365–380 (2008)
MathSciNet MATH Google Scholar
Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15, 35–47 (2014)
MathSciNet MATH Google Scholar
Aubin, J.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9, 87–111 (1984)
Article MathSciNet MATH Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends R Mach. Learn. 4(1), 1–106 (2012)
MATH Google Scholar
Beck, A.: First-order methods in optimization, vol. 25. SIAM, New Delhi (2017)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bishop, C.M.: Pattern recognition and machine learning. Springer-Verlag, New York (2006)
MATH Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet MATH Google Scholar
Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics 64, 115–123 (2008)
Article MathSciNet MATH Google Scholar
Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted ℓ₁ minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)
Article MathSciNet MATH Google Scholar
Dontchev, A.L., Rockafellar, R.T.: Regularity and conditioning of solution mappings in variational analysis. Set-Valued Anal. 12, 79–109 (2004)
Article MathSciNet MATH Google Scholar
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43, 919–948 (2018)
Article MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, Berlin (2007)
MATH Google Scholar
Fercoq, O., Richtrik, P.: Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent. SIAM Rev. 58, 739–771 (2016)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv:1606.00269 (2010)
Gfrerer, H.: On directional metric regularity, subregularity and optimality conditions for nonsmooth mathematical programs. Set-Valued Variat. Anal. 21, 151–176 (2013)
Article MathSciNet MATH Google Scholar
Güler, O., Hoffman, A.J., Rothblum, U.G.: Approximations to solutions to systems of linear inequalities. SIAM J. Matrix Anal. Appl. 16, 688–696 (1995)
Article MathSciNet MATH Google Scholar
Guo, L., Ye, J.J., Zhang, J.: Mathematical programs with geometric constraints in Banach spaces: enhanced optimality, exact penalty, and sensitivity. SIAM J. Optim. 23, 2295–2319 (2013)
Article MathSciNet MATH Google Scholar
Gfrerer, H., Ye, J.J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27, 842–865 (2017)
Article MathSciNet MATH Google Scholar
Henrion, R., Jourani, A., Outrata, J.: On the calmness of a class of multifunctions. SIAM J. Optim. 13, 603–618 (2002)
Article MathSciNet MATH Google Scholar
Henrion, R., Outrata, J.: Calmness of constraint systems with applications. Math. Program. 104, 437–464 (2005)
Article MathSciNet MATH Google Scholar
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Research Nat. Bur. Standards 49, 263–265 (1952)
Article MathSciNet Google Scholar
Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. 163, 85–114 (2017)
Article MathSciNet MATH Google Scholar
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp 795–811. Springer (2016)
Klatte, D., Kummer, B.: Constrained minima and lipschitzian penalties in metric spaces. SIAM J. Optim. 13, 619–633 (2002)
Article MathSciNet MATH Google Scholar
Klatte, D., Thiere, G.: Error bounds for solutions of linear equations and inequalities. Zeitschrift fü,r Oper. Res. 41, 191–214 (1995)
MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Calculus of the exponent of kurdykaŁojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Article MathSciNet MATH Google Scholar
Li, X., Zhao, T., Arora, R., Liu, H., Hong, M.: An improved convergence analysis of cyclic block coordinate descent-type methods for strongly convex minimization. Artif. Intell. Stat., 491–499 (2016)
Liu, Y.L., Bi, S.J., Pan, S.H.: Several classes of stationary points for rank regularized minimization problems. SIAM J. Optim. 30(2), 1756–1775 (2020)
Article MathSciNet MATH Google Scholar
Luke, D.R., Nguyen, H.T., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings. Math. Oper. Res. 43, 1143–1176 (2018)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control. Optim. 30, 408–425 (1992)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Article MathSciNet MATH Google Scholar
Martinet, B.: Brève communication régularisation d’inéquations variationnelles par approximations successives. Revue française d’informatique et de Recherche Opérationnelle, Sé,rie Rouge 4, 154–158 (1970)
MATH Google Scholar
Mordukhovich, B.: Variational analysis and generalized differentiation i: basic theory, II: applications. Springer Science & Business Media, Berlin (2006)
Google Scholar
Necoara, I., Clipici, D.: Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC, vol. 23 (2013)
Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: Convergence analysis and error bounds. SIAM J. Optim. 26, 197–226 (2016)
Article MathSciNet MATH Google Scholar
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175, 69–107 (2019)
Article MathSciNet MATH Google Scholar
Necoara, I., Nesterov, Y., Glineur, F.: Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 173, 227–254 (2017)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory lectures on convex optimization. Kluwer Academic, Dordrecht (2004)
Book MATH Google Scholar
O’donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Article MathSciNet MATH Google Scholar
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Article MathSciNet MATH Google Scholar
Peña, J., Vera, J.C., Zuluaga, L.F.: An algorithm to compute the Hoffman constant of a system of linear constraints. arXiv:1804.08418 (2018)
Peña, J., Vera, J.C., Zuluaga, L.F.: New characterizations of Hoffman constants for systems of linear constraints. Math. Prog. (2020)
Polyak, B.T.: Introduction to optimization, optimization software incorporation. Publications Division, New York (1987)
Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144, 1–38 (2014)
Article MathSciNet MATH Google Scholar
Robinson, S.M.: Stability theory for systems of inequalities. Part i: Linear systems. SIAM J. Numer. Anal. 12, 754–769 (1975)
Article MathSciNet MATH Google Scholar
Robinson, S.M.: An implicit-function theorem for generalized variational inequalities Technical report (WISCONSIN UNIV MADISON MATHEMATICS RESEARCH CEN- TER, 1976)
Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Study 14, 206–214 (1981)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex analysis. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Rockafellar, R.T., Wets, R.: Variational analysis. Springer Science & Business Media, Berlin (2009)
MATH Google Scholar
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4, 27–46 (2016)
Article MathSciNet MATH Google Scholar
Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inf. Process. Sys. 24, 1458–1466 (2011)
Google Scholar
Stoer, J., Witzgall, C.: Convexity and optimization in finite dimensions I. Springer Science & Business Media, Berlin (2012)
MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Series B (Methodological) 73, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R, Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Royal Stat. Soc. Ser. B (Statistical Methodology) 67, 91–108 (2005)
Article MathSciNet MATH Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125, 263–295 (2010)
Article MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Article MathSciNet MATH Google Scholar
Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)
MathSciNet MATH Google Scholar
Wang, X., Ye, J.J., Yuan, X., Zeng, S., Zhang, J.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis. Set-Valued Variat. Anal. (2021)
Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23, 1062–1091 (2013)
Article MathSciNet MATH Google Scholar
Ye, J.J., Ye, X.Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22, 977–997 (1997)
Article MathSciNet MATH Google Scholar
Ye, J.J., Zhou, J.C.: Verifiable sufficient conditions for the error bound property of second-order cone complementarity problems. Math. Program. 171, 361–395 (2018)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with group variables. J. Royal Stat. Soc. Series B (Statistical Methodology) 68, 49–67 (2006)
Article MathSciNet MATH Google Scholar
Yuan, X., Zeng, S., Zhang, J.: Discerning the linear convergence of ADMM for structured convex optimization through the lens of variational analysis. J. Mach. Learn. Res. 21, 1–75 (2020)
MathSciNet MATH Google Scholar
Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. Math. Program. 180(1), 371–416 (2020)
Article MathSciNet MATH Google Scholar
Zhang, H., Jiang, J., Luo, Z.Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1, 163–186 (2013)
Article MATH Google Scholar
Zhang, S.: Global error bounds for convex conic problems. SIAM J. Optim. 10, 836–851 (2000)
Article MathSciNet MATH Google Scholar
Zheng, X.Y., Ng, K.F.: Metric subregularity of piecewise linear multifunctions and applications to piecewise linear multiobjective optimization. SIAM J. Optim. 24, 154–174 (2014)
Article MathSciNet MATH Google Scholar
Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)
Article MathSciNet MATH Google Scholar
Zhou, Z., Zhang, Q., So, A.M.-C.: L_1,p-norm regularization: error bounds and convergence rate analysis of first-order methods. In: International conference on machine learning, pp 1501–1510 (2015)

Download references

Acknowledgments

We are grateful to two anonymous referees for their suggestions and comments which have helped us improve the paper substantially.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Victoria, Victoria, BC, V8P 5C2, Canada
Jane J. Ye
Department of Mathematics, The University of Hong Kong, Hong Kong, China
Xiaoming Yuan & Shangzhi Zeng
Department of Mathematics, SUSTech International Center for Mathematics, Southern University of Science and Technology, National Center for Applied Mathematics Shenzhen, Shenzhen, China
Jin Zhang

Authors

Jane J. Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shangzhi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Yuan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was partially supported by NSERC, the general research fund from the Research Grants Council of Hong Kong 12302318, National Science Foundation of China 11971220, Natural Science Foundation of Guangdong Province 2019A1515011152

Appendix

Proof of Theorem 1

Since $L_{i_k}$ is the Lipschitz constant of $\nabla _{i_k} f$, $c^k_i \ge L_i$ and $x^{k+1}_j = x^k_j, \forall j\neq i_k$, we have

$$ f(x^{k+1}) \le f(x^{k}) + \left\langle \nabla_{i_{k}} f(x^{k}), x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2} \| x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \|^{2}, $$

which implies that

$$ F(x^{k+1}) \le F(x^{k}) + \left\langle \nabla_{i_{k}} f(x^{k}), x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2} \| x^{k+1}_{i_{k}} - x^{k}_{i_{k}} \|^{2} + g_{i_{k}}(x_{i_{k}}^{k+1}) - g_{i_{k}}(x_{i_{k}}^{k}). $$

Combining with the iteration scheme (3), we have

$$ F(x^{k+1}) - F(x^{k}) \le \min_{t_{i_{k}}}\left\{ \left\langle \nabla_{i_{k}} f(x^{k}), t_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2}t_{i_{k}}^{2} + g_{i_{k}}(x_{i_{k}}^{k}+t_{i_{k}}) - g_{i_{k}}(x_{i_{k}}^{k}) \right\}, $$

where $t_{i_k}:=x_{i_k}-x^k_{i_k}$. Recall that for a given iteration point x^k, the next iteration point x^k+ 1 is obtained by using the scheme (3) where the index i_k is randomly chosen from $\{1,\dots , N\}$ with the uniform probability distribution. Conditioned on x^k and taking expectation with respect to the random index i_k, we obtain

$$ \begin{array}{@{}rcl@{}} \lefteqn{\mathbb{E}[F(x^{k+1}) - F(x^{k}) ~|~ x^{k} ] } \\ && \le \mathbb{E}\left\{\min_{t_{i_{k}}}\left\langle \nabla_{i_{k}} f(x^{k}), t_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2}t_{i_{k}}^{2} + g_{i_{k}}(x_{i_{k}}^{k}+t_{i_{k}}) - g_{i_{k}}(x_{i_{k}}^{k}) ~\Big|~ x^{k} \right\}. \end{array} $$

(29)

Now, we are going to estimate the right hand side in the above inequality

$$ \begin{array}{@{}rcl@{}} && ~~~~ \mathbb{E}\left\{\min_{t_{i_{k}}}\left\langle \nabla_{i_{k}} f(x^{k}), t_{i_{k}} \right\rangle + \frac{c_{i_{k}}^{k}}{2}t_{i_{k}}^{2} + g_{i_{k}}(x_{i_{k}}^{k}+t_{i_{k}}) - g_{i_{k}}(x_{i_{k}}^{k}) ~\Big|~ x^{k} \right\}\\ &=& \frac{1}{N}{\sum}_{i = 1}^{N} \left \{ \min_{t_{i}}\left\langle \nabla_{i} f(x^{k}), t_{i} \right\rangle + \frac{{c_{i}^{k}}}{2}{t_{i}^{2}} + g_{i}({x_{i}^{k}}+t_{i}) - g_{i}({x_{i}^{k}})\right \} \\ &\le& \frac{1}{N} \min_{t}{\sum}_{i = 1}^{N} \left\{\left\langle \nabla_{i} f(x^{k}), t_{i} \right\rangle + \frac{C}{2}{t_{i}^{2}} + g_{i}({x_{i}^{k}}+t_{i}) - g_{i}({x_{i}^{k}}) \right\}\\ &=& \frac{1}{N} \min_{y}\left\{ \left\langle \nabla f(x^{k}), y-x^{k} \right\rangle + \frac{C}{2}\|y-x^{k}\|^{2} + g(y) - g(x^{k})\right\} \\ & =& \frac{1}{N} (F_{C}(x^{k}) - F(x^{k})), \end{array} $$

(30)

where $t\!:=\!(t_1,\dots , t_N)$ and $F_{C}(x) := \min \limits _{y}\left \{ f(x) + \left \langle \nabla f(x), y - x \right \rangle + \frac {C}{2}\|y - x\|^2 + g(y) \right \} $.

Furthermore, we set

$$ \begin{array}{@{}rcl@{}} \hat{x}^{k} &:=& (I + \frac{1}{C} \partial g)^{-1} \left( x^{k} - \frac{1}{C} \nabla f(x^{k}) \right) \\ &=& \arg\min_{y}\left\{ \left\langle \nabla f(x^{k}), y-x^{k} \right\rangle + \frac{C}{2}\|y-x^{k}\|^{2} + g(y)\right\}. \end{array} $$

It follows immediately that

$$ g \left( x^{k} \right) \ge g \left( \hat{x}_{k} \right) - \left\langle \nabla f(x^{k}) + C(\hat{x}_{k} - x^{k}), x^{k} - \hat{x}_{k} \right\rangle, $$

which yields that,

$$ f \left( x^{k} \right) + g \left( x^{k} \right) \ge f \left( x^{k} \right) + \left\langle \nabla f(x^{k}), \hat{x}_{k} - x^{k} \right\rangle + \frac{C}{2}\| \hat{x}_{k} - x^{k} \|^{2} + g \left( \hat{x}_{k} \right) + \frac{C}{2}\| \hat{x}_{k} - x^{k} \|^{2}, $$

and hence

$$ F(x^{k}) \ge F_{C}(x^{k}) + \frac{C}{2}\| \hat{x}_{k} - x^{k} \|^{2}. $$

(31)

Let $\tilde {x}:= \text {Proj}_{\mathcal {X}}(x)$ for any x and thus $f(\tilde {x}) + g(\tilde {x}) = F^{*}$. Then we have

$$ \begin{array}{@{}rcl@{}} F_{C}(x) - F^{*} &=& \min_{y}\left\{ f(x) + \left\langle \nabla f(x), y-x \right\rangle + \frac{C}{2}\|y-x\|^{2} + g(y) \right\} - f(\tilde{x}) - g(\tilde{x}) \\ &\le& f(x) - f(\tilde{x}) + \left\langle \nabla f(x), \tilde{x}-x \right\rangle + \frac{C}{2}\|\tilde{x}-x\|^{2} \\ &\le& \frac{L+C}{2}\|\tilde{x}-x\|^{2} = \frac{L+C}{2}\text{dist}\left( x, {\mathcal{X}}\right)^{2}, \end{array} $$

where L is the Lipschitz constant of ∇f. Plugging x = x^k into the above inequalities, we have

$$ \begin{array}{@{}rcl@{}} F_{C}(x^{k}) - F^{*} &\le& \frac{L+C}{2}\text{dist}\left( x^{k}, {\mathcal{X}}\right)^{2} \\ &\le& \frac{\kappa(L+C)}{2} \|\hat{x}_{k}-x^{k}\|^{2} \\ &\le& \kappa(1+L/C)(F(x^{k}) - F_{C}(x^{k})), \end{array} $$

where the second inequality follows from (14) and the third inequality is a direct consequence of (31). Then we have

$$ \begin{array}{@{}rcl@{}} F(x^{k}) - F^{*} &=& F(x^{k}) - F_{C}(x^{k}) + F_{C}(x^{k}) - F^{*} \\ & \le& (1+\kappa(1+L/C))(F(x^{k}) - F_{C}(x^{k})). \end{array} $$

(32)

By (29), (30) and (32), we have

$$ \mathbb{E}[F(x^{k+1}) - F(x^{k}) ~|~ x^{k}] \leq \frac{1}{N} (F_{C}(x^{k}) - F(x^{k})) \le \frac{1}{N} \cdot \frac{1}{1+\kappa(1+L/C)}(F^{*} - F(x^{k})), $$

therefore

$$ \mathbb{E}[F(x^{k+1}) - F^{*} ~|~ x^{k}] \le \Big(1 - \frac{1}{N(1+\kappa(1+L/C))} \Big) (F(x^{k}) - F^{*}). $$

For any l ≥ 1, combining the above inequality over k = 0, 1, … , l − 1, taking expectation with all the history, we obtain

$$ \mathbb{E}[F(x^{l})-F^{*}] \le \sigma^{l} (F(x^{0})-F^{*}), $$

where $\sigma = \Big (1 - \frac {1}{N(1+\kappa (1+L/C))} \Big ) \in (0,1)$, and hence the R-BCPGM achieves a linear convergence rate in terms of the expected objective value. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, J.J., Yuan, X., Zeng, S. et al. Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems. Set-Valued Var. Anal 29, 803–837 (2021). https://doi.org/10.1007/s11228-021-00591-3

Download citation

Received: 27 April 2020
Accepted: 11 May 2021
Published: 29 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11228-021-00591-3

Keywords

Mathematics Subject Classification 2010

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Abstract

Access this article

Similar content being viewed by others

Linear Convergence of Prox-SVRG Method for Separable Non-smooth Convex Optimization Problems under Bounded Metric Subregularity

On the Linear Convergence of the Approximate Proximal Splitting Method for Non-smooth Convex Optimization

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification 2010

Navigation

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Abstract

Access this article

Similar content being viewed by others

Linear Convergence of Prox-SVRG Method for Separable Non-smooth Convex Optimization Problems under Bounded Metric Subregularity

On the Linear Convergence of the Approximate Proximal Splitting Method for Non-smooth Convex Optimization

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification 2010

Search

Navigation