Skip to main content
Log in

A Unified Convergence Analysis of Stochastic Bregman Proximal Gradient and Extragradient Methods

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We consider a mini-batch stochastic Bregman proximal gradient method and a mini-batch stochastic Bregman proximal extragradient method for stochastic convex composite optimization problems. A simplified and unified convergence analysis framework is proposed to obtain almost sure convergence properties and expected convergence rates of the mini-batch stochastic Bregman proximal gradient method and its variants. This framework can also be used to analyze the convergence of the mini-batch stochastic Bregman proximal extragradient method, which has seldom been discussed in the literature. We point out that the standard uniformly bounded variance assumption and the usual Lipschitz gradient continuity assumption are not required in the analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: from Theory to Algorithms. Cambridge University Press, New York, NY, USA (2014)

    Book  Google Scholar 

  2. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  Google Scholar 

  3. Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2011)

    Book  Google Scholar 

  4. Fu, M.C.: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14(3), 192–215 (2002)

    Article  MathSciNet  Google Scholar 

  5. Fu, M.C.: Handbook of Simulation Optimization, International Series in Operations Research and Management Science. Springer, New York (2015)

    Google Scholar 

  6. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  7. Newton, D., Youseian, F., Pasupathy, R.: Stochastic gradient descent: Recent trends. In: E. Gel, L. Ntaimo (eds.) Recent Advances in Optimization and Modeling of Contemporary Problems, pp. 193–220. INFORMS (2018)

  8. Atchadé, Y.F., Fort, G., Moulines, E.: On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18, 1–33 (2017)

    MathSciNet  MATH  Google Scholar 

  9. Lei, J., Shanbhag, U.V.: Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes. Optim Methods Softw 0, 1–31 (2020)

    Article  Google Scholar 

  10. Lei, J., Shanbhag, U.V.: Variance-reduced accelerated first-order methods: central limit theorems and confidence statements (2020). https://arxiv.org/abs/2006.07769

  11. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)

    Article  MathSciNet  Google Scholar 

  12. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654. Curran Associates, Inc. (2014)

  13. Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155(1–2, Ser. A), 105–145 (2016)

    Article  MathSciNet  Google Scholar 

  14. Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2, Ser. A), 267–305 (2016)

    Article  MathSciNet  Google Scholar 

  15. Ghadimi, S.: Conditional gradient type methods for composite nonlinear and stochastic optimization. Math. Program. 173(1–2, Ser. A), 431–464 (2019)

    Article  MathSciNet  Google Scholar 

  16. Jofré, A., Thompson, P.: On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Program. 174(1–2, Ser. B), 253–292 (2019)

    Article  MathSciNet  Google Scholar 

  17. Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)

    Article  MathSciNet  Google Scholar 

  18. Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)

    Article  MathSciNet  Google Scholar 

  19. Yousefian, F., Nedić, A., Shanbhag, U.V.: On stochastic mirror-prox algorithms for stochastic Cartesian variational inequalities: randomized block coordinate and optimal averaging schemes. Set-Valued Var. Anal. 26(4), 789–819 (2018)

    Article  MathSciNet  Google Scholar 

  20. Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12, 747–756 (1976)

    MathSciNet  MATH  Google Scholar 

  21. Xiu, N., Zhang, J.: Some recent advances in projection-type methods for variational inequalities. J. Comput. Appl. Math. 152(1–2), 559–585 (2003)

    Article  MathSciNet  Google Scholar 

  22. Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems. Springer-Verlag, New York (2003)

    MATH  Google Scholar 

  23. Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686–724 (2017)

    Article  MathSciNet  Google Scholar 

  24. Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 29(1), 175–206 (2019)

    Article  MathSciNet  Google Scholar 

  25. Kannan, A., Shanbhag, U.V.: Optimal stochastic extragradient schemes for pseudomonotone stochastic variational inequality problems and their variants. Comput. Optim. Appl. 74(3), 779–820 (2019)

    Article  MathSciNet  Google Scholar 

  26. Jalilzadeh, A., Shanbhag, U.V.: eg-VSSA: An extragradient variable sample-size stochastic approximation scheme: Error analysis and complexity trade-offs. In: 2016 Winter Simulation Conference (WSC), pp. 690–701 (2016)

  27. Lin, T., Ma, S., Zhang, S.: An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)

    Article  MathSciNet  Google Scholar 

  28. Nguyen, T.P., Pauwels, E., Richard, E., Suter, B.W.: Extragradient method in optimization: convergence and complexity. J. Optim. Theory Appl. 176(1), 137–162 (2018)

    Article  MathSciNet  Google Scholar 

  29. Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization (2019). https://arxiv.org/abs/1910.09373

  30. Chavdarova, T., Gidel, G., Fleuret, F., Lacoste-Julien, S.: Reducing noise in gan training with variance reduced extragradient. In: H. Wallach, H. Laro chelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 393–403. Curran Associates, Inc. (2019)

  31. Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the convergence of single-call stochastic extra-gradient methods. In: Advances in Neural Information Processing Systems 32, pp. 6938–6948. Curran Associates, Inc. (2019)

  32. Mokhtari, A., Ozdaglar, A., Pattathil, S.: A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. pp. 1497–1507. PMLR, Online (2020)

  33. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Rustagi, J.S. (ed.) Optimizing methods in statistics, pp. 233–257. Academic Press, New York (1971)

    Google Scholar 

  34. Rockafellar, R.T.: Convex analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J. (1970)

  35. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

    Article  MathSciNet  Google Scholar 

  36. Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1, Ser. B), 67–96 (2018)

    Article  MathSciNet  Google Scholar 

  37. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    Article  MathSciNet  Google Scholar 

  38. Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)

    Article  MathSciNet  Google Scholar 

  39. Grimmer, B.: Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity. SIAM J. Optim. 29(2), 1350–1365 (2019)

    Article  MathSciNet  Google Scholar 

  40. Nguyen, T.H., Simsekli, U., Gurbuzbalaban, M., Richard, G.: First exit time analysis of stochastic gradient descent under heavy-tailed gradient noise. In: H. Wallach, H. Laro chelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 273–283. Curran Associates, Inc. (2019)

  41. Lei, Y., Hu, T., Li, G., Tang, K.: Stochastic gradient descent for nonconvex learning without bounded gradient assumptions. IEEE Trans. Neural Netw. Learn. Syst. 31(10), 4394–4400 (2020)

    Article  MathSciNet  Google Scholar 

  42. Lei, Y., Ying, Y.: Fine-grained analysis of stability and generalization for stochastic gradient descent (2020). https://arxiv.org/abs/2006.08157

  43. Cui, S., Shanbhag, U.V.: On the analysis of variance-reduced and randomized projection variants of single projection schemes for monotone stochastic variational inequality problems (2019). https://arxiv.org/abs/1904.11076

Download references

Acknowledgements

The author would like to thank the referees and the associate editor for their helpful comments and suggestions. This work was partially supported by the National Natural Science Foundation of China (No. 11871135) and the Fundamental Research Funds for the Central Universities (No. DUT19K46).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiantao Xiao.

Additional information

Communicated by Alfredo N. Iusem.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, X. A Unified Convergence Analysis of Stochastic Bregman Proximal Gradient and Extragradient Methods. J Optim Theory Appl 188, 605–627 (2021). https://doi.org/10.1007/s10957-020-01799-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01799-3

Keywords

Mathematics Subject Classification

Navigation