Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Darina Dvinskikh; Alexander Gasnikov

doi:10.1515/jiip-2020-0068

Published by De Gruyter January 22, 2021

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Darina Dvinskikh and Alexander Gasnikov

From the journal Journal of Inverse and Ill-posed Problems

https://doi.org/10.1515/jiip-2020-0068

Showing a limited preview of this publication:

Abstract

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems.

Keywords: Stochastic optimization; convex optimization; decentralized optimization; complexity bounds; first-order method; mini-batch; sum-type inverse problems

MSC 2010: 90C25; 90C06; 90C15

Funding source: Russian Science Foundation

Award Identifier / Grant number: 18-71-10108

Funding source: Russian Foundation for Basic Research

Award Identifier / Grant number: 19-31-51001

Funding source: Ministry of Science and Higher Education of the Russian Federation

Award Identifier / Grant number: 075-00337-20-03

Award Identifier / Grant number: 0714-2020-0005

Funding statement: The work of D. Dvinskikh was supported (Sections 1 and 5) by RFBR 19-31-51001 and funded (Section 6) by the Russian Science Foundation (Project 18-71-10108). The work of A. Gasnikov was supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No. 075-00337-20-03, Project No. 0714-2020-0005.

Acknowledgements

We would like to thank F. Bach, P. Dvurechensky, E. Gorbunov, A. Koloskova, A. Kulunchakov, J. Mairal, A. Nemirovski, A. Olshevsky, S. Parsegov, B. Polyak, N. Srebro A. Taylor and C. Uribe for useful discussions.

References

[1] Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York (2017), 1200–1205. 10.1145/3055399.3055448Search in Google Scholar

[2] Z. Allen-Zhu, How to make the gradients small stochastically: Even faster convex and nonconvex SGD, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 1157–1167. Search in Google Scholar

[3] Z. Allen-Zhu and E. Hazan, Optimal black-box reductions between optimization objectives, Advances in Neural Information Processing Systems 29 (NeurIPS 2016), Neural Information Processing Systems Foundation, San Diego (2016), 1614–1622. Search in Google Scholar

[4] A. S. Anikin, A. V. Gasnikov, P. E. Dvurechensky, A. I. Tyurin and A. V. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints, Comput. Math. Math. Phys. 57 (2017), no. 8, 1262–1276. 10.1134/S0965542517080048Search in Google Scholar

[5] Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, Advances in Neural Information Processing Systems 28 (NeurIPS 2015), Neural Information Processing Systems Foundation, San Diego (2015), 1756–1764. Search in Google Scholar

[6] A. d’Aspremont, Smooth optimization with approximate gradient, SIAM J. Optim. 19 (2008), no. 3, 1171–1183. 10.1137/060676386Search in Google Scholar

[7] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization. Analysis, Algorithms, and Engineering Applications, MPS/SIAM Ser. Optim., Society for Industrial and Applied Mathematics, Philadelphia, 2001. 10.1137/1.9780898718829Search in Google Scholar

[8] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Upper Saddle River, 1989. Search in Google Scholar

[9] A. Beznosikov, E. Gorbunov and A. Gasnikov, Derivative-free method for decentralized distributed non-smooth optimization, preprint (2019), https://arxiv.org/abs/1911.10645. Search in Google Scholar

[10] C. L. Byrne, Iterative Optimization in Inverse Problems, Monogr. Research Notes Math., CRC Press, Boca Raton, 2014. 10.1201/b16485Search in Google Scholar

[11] Y. Carmon and J. Duchi, Gradient descent finds the cubic-regularized nonconvex Newton step, SIAM J. Optim. 29 (2019), no. 3, 2146–2178. 10.1137/17M1113898Search in Google Scholar

[12] A. Chernov, P. Dvurechensky and A. Gasnikov, Fast primal-dual gradient method for strongly convex minimization problems with linear constraints, Discrete Optimization and Operations Research, Lecture Notes in Comput. Sci. 9869, Springer, Cham (2016), 391–403. 10.1007/978-3-319-44914-2_31Search in Google Scholar

[13] M. B. Cohen, J. Diakonikolas and L. Orecchia, On acceleration with noise-corrupted gradients, preprint (2018), https://arxiv.org/abs/1805.12591. 10.29007/7mhjSearch in Google Scholar

[14] O. Devolder, Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization, PhD thesis, ICTEAM and CORE, Université Catholique de Louvain, 2013. Search in Google Scholar

[15] O. Devolder, F. Glineur and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Math. Program. 146 (2014), no. 1–2, 37–75. 10.1007/s10107-013-0677-5Search in Google Scholar

[16] D. Dvinskikh, E. Gorbunov, A. Gasnikov, P. Dvurechensky and C. A. Uribe, On dual approach for distributed stochastic convex optimization over networks, preprint (2019), https://arxiv.org/abs/1903.09844. 10.1109/CDC40024.2019.9029798Search in Google Scholar

[17] D. Dvinskikh, A. Turin, A. Gasnikov and S. Omelchenko, Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality, Mat. Zametki 108 (2020), no. 4, 515–528. 10.1134/S0001434620090230Search in Google Scholar

[18] P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C. A. Uribe and A. Nedich, Decentralize and randomize: Faster algorithm for wasserstein barycenters, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 10760–10770. Search in Google Scholar

[19] P. Dvurechensky and A. Gasnikov, Stochastic intermediate gradient method for convex problems with stochastic inexact oracle, J. Optim. Theory Appl. 171 (2016), no. 1, 121–145. 10.1007/s10957-016-0999-6Search in Google Scholar

[20] P. Dvurechensky, A. Gasnikov and A. Lagunovskaya, Parallel algorithms and probability of large deviation for stochastic convex optimization problems, Numer. Anal. Appl. 11 (2018), no. 1, 33–37. 10.1134/S1995423918010044Search in Google Scholar

[21] P. Dvurechensky, A. Gasnikov and A. Tiurin, Randomized similar triangles method: A unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method), preprint (2017), https://arxiv.org/abs/1707.08486. Search in Google Scholar

[22] P. Dvurechensky, E. Gorbunov and A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization, European J. Oper. Res. 290 (2021), no. 2, 601–621. 10.1016/j.ejor.2020.08.027Search in Google Scholar

[23] A. Fallah, M. Gurbuzbalaban, A. Ozdaglar, U. Simsekli and L. Zhu, Robust distributed accelerated stochastic gradient methods for multi-agent networks, preprint (2019), https://arxiv.org/abs/1910.08701. Search in Google Scholar

[24] D. Foster, A. Sekhari, O. Shamir, N. Srebro, K. Sridharan and B. Woodworth, The complexity of making the gradient small in stochastic convex optimization, preprint (2019), https://arxiv.org/abs/1902.04686. Search in Google Scholar

[25] Y. Gao and T. Blumensath, Distributed computation of linear inverse problems with application to computed tomography, preprint (2017), https://arxiv.org/abs/1709.00953. Search in Google Scholar

[26] A. Gasnikov, Universal gradient descent, preprint (2017), https://arxiv.org/abs/1711.00394. Search in Google Scholar

[27] A. Gasnikov, S. Kabanikhin, A. Mohammed and M. Shishlenin, Convex optimization in hilbert space with applications to inverse problems, preprint (2017), https://arxiv.org/abs/1703.00267. Search in Google Scholar

[28] A. Gasnikov and Y. Nesterov, Universal method for stochastic composite optimization problems, Comput. Math. Math. Phys. 58 (2018), no. 1, 48–64. 10.1134/S0965542518010050Search in Google Scholar

[29] S. Ghadimi and G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim. 23 (2013), no. 4, 2341–2368. 10.1137/120880811Search in Google Scholar

[30] A. Godichon-Baggioni and S. Saadane, On the rates of convergence of parallelized averaged stochastic gradient algorithms, Statistics 54 (2020), no. 3, 618–635. 10.1080/02331888.2020.1764557Search in Google Scholar

[31] E. Gorbunov, D. Dvinskikh and A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization, preprint (2019), https://arxiv.org/abs/1911.07363. Search in Google Scholar

[32] V. Guigues, A. Juditsky and A. Nemirovski, Non-asymptotic confidence bounds for the optimal value of a stochastic program, Optim. Methods Softw. 32 (2017), no. 5, 1033–1058. 10.1080/10556788.2017.1350177Search in Google Scholar

[33] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd ed., Springer Ser. Statist., Springer, New York, 2009. 10.1007/978-0-387-84858-7Search in Google Scholar

[34] H. Hendrikx, F. Bach and L. Massoulié, Accelerated decentralized optimization with local updates for smooth and strongly convex objectives, preprint (2018), https://arxiv.org/abs/1810.02660. Search in Google Scholar

[35] H. Hendrikx, F. Bach and L. Massoulié, An accelerated decentralized stochastic proximal algorithm for finite sums, preprint (2019), https://arxiv.org/abs/1905.11394. Search in Google Scholar

[36] H. Hendrikx, F. Bach and L. Massoulié, Asynchronous accelerated proximal stochastic gradient for strongly convex distributed finite sums, preprint (2019), https://arxiv.org/abs/1901.09865. Search in Google Scholar

[37] H. Hendrikx, F. Bach and L. Massoulié, An optimal algorithm for decentralized finite sum optimization, preprint (2020), https://arxiv.org/abs/2005.10675. 10.1137/20M134842XSearch in Google Scholar

[38] H. Hendrikx, F. Bach and L. Massoulié, Dual-free stochastic decentralized optimization with variance reduction, preprint (2020), https://arxiv.org/abs/2006.14384. Search in Google Scholar

[39] H. Hendrikx, L. Xiao, S. Bubeck, F. Bach and L. Massoulié, Statistically preconditioned accelerated gradient method for distributed optimization, preprint (2020), https://arxiv.org/abs/2002.10726. Search in Google Scholar

[40] A. Ivanova, D. Grishchenko, A. Gasnikov and E. Shulgin, Adaptive catalyst for smooth convex optimization, preprint (2019), https://arxiv.org/abs/1911.11271. 10.1007/978-3-030-91059-4_2Search in Google Scholar

[41] C. Jin, P. Netrapalli, R. Ge, S. M. Kakade and M. I. Jordan, A short note on concentration inequalities for random vectors with subgaussian norm, preprint (2019), https://arxiv.org/abs/1902.03736. Search in Google Scholar

[42] S. Kakade, S. Shalev-Shwartz and A. Tewari, On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization, unpublished manuscript (2009), http://ttic.uchicago.edu/shai/papers/KakadeShalevTewari09.pdf. Search in Google Scholar

[43] S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich and A. T. Suresh, Scaffold: Stochastic controlled averaging for federated learning, preprint (2019), https://arxiv.org/abs/1910.06378. Search in Google Scholar

[44] V. M. Kibardin, Decomposition into functions in the minimization problem, Avtom. Telem. 1979 (1979), no. 9, 66–79. Search in Google Scholar

[45] D. Kim and J. A. Fessler, Optimized first-order methods for smooth convex minimization, Math. Program. 159 (2016), no. 1–2, 81–107. 10.1007/s10107-015-0949-3Search in Google Scholar PubMed PubMed Central

[46] A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi and S. U. Stich, A unified theory of decentralized SGD with changing topology and local updates, Proceedings of the 37th International Conference on Machine Learning. ICML 2020, ICML, San Diego (2020), 5381–5393; https://arxiv.org/abs/2003.10422. Search in Google Scholar

[47] D. Kovalev, A. Salim and P. Richtarik, Optimal and practical algorithms for smooth and strongly convex decentralized optimization, preprint (2020), https://arxiv.org/abs/2006.11773. Search in Google Scholar

[48] A. Kulunchakov and J. Mairal, A generic acceleration framework for stochastic composite optimization, preprint (2019), https://arxiv.org/abs/1906.01164. Search in Google Scholar

[49] A. Kulunchakov and J. Mairal, Estimate sequences for stochastic composite optimization: Variance reduction, acceleration, and robustness to noise, preprint (2019), https://arxiv.org/abs/1901.08788. Search in Google Scholar

[50] A. Kulunchakov and J. Mairal, Estimate sequences for variance-reduced stochastic composite optimization, preprint (2019), https://arxiv.org/abs/1905.02374. Search in Google Scholar

[51] G. Lan, Gradient sliding for composite optimization, Math. Program. 159 (2016), no. 1–2, 201–235. 10.1007/s10107-015-0955-5Search in Google Scholar

[52] G. Lan, Lectures on optimization methods for machine learning, Lecture notes (2019), http://pwp.gatech.edu/guanghui-lan/wp-content/uploads/sites/330/2019/08/LectureOPTML.pdf. Search in Google Scholar

[53] G. Lan, S. Lee and Y. Zhou, Communication-efficient algorithms for decentralized and stochastic optimization, Math. Program. 180 (2020), no. 1–2, 237–284. 10.1007/s10107-018-1355-4Search in Google Scholar

[54] G. Lan and Y. Zhou, An optimal randomized incremental gradient method, Math. Program. 171 (2018), no. 1–2, 167–215. 10.1007/s10107-017-1173-0Search in Google Scholar

[55] G. Lan and Y. Zhou, Random gradient extrapolation for distributed and stochastic optimization, SIAM J. Optim. 28 (2018), no. 4, 2753–2782. 10.1137/17M1157891Search in Google Scholar

[56] H. Li, C. Fang, W. Yin and Z. Lin, A sharp convergence rate analysis for distributed accelerated gradient methods, preprint (2018), https://arxiv.org/abs/1810.01053. 10.29007/h5vsSearch in Google Scholar

[57] H. Li and Z. Lin, Revisiting EXTRA for smooth distributed optimization, SIAM J. Optim. 30 (2020), no. 3, 1795–1821. 10.1137/18M122902XSearch in Google Scholar

[58] H. Li, Z. Lin and Y. Fang, Optimal accelerated variance reduced extra and diging for strongly convex and smooth decentralized optimization, preprint (2020), https://arxiv.org/abs/2009.04373. 10.29007/qwpkSearch in Google Scholar

[59] H. Lin, J. Mairal and Z. Harchaoui, A universal catalyst for first-order optimization, Proceedings of the 28th International Conference on Neural Information Processing Systems – NIPS’15, MIT Press, Cambridge (2015), 3384–3392. Search in Google Scholar

[60] B. Mathieu, T. Adrien and B. Francis, Principled analyses and design of first-order methods with proximal inexact proximal operator, preprint (2020), https://arxiv.org/abs/2006.06041. Search in Google Scholar

[61] H. B. McMahan, E. Moore, D. Ramage, S. Hampson and B. Agüera y Arcas, Communication-efficient learning of deep networks from decentralized data, preprint (2016), https://arxiv.org/abs/1602.05629. Search in Google Scholar

[62] A. Nedić, Distributed optimization over networks, Multi-Agent Optimization, Lecture Notes in Math. 2224, Springer, Cham (2018), 1–84. 10.1007/978-3-319-97142-1_1Search in Google Scholar

[63] A. Nedić, Distributed gradient methods for convex machine learning problems in networks: Distributed optimization, IEEE Signal Process. Mag. 37 (2020), no. 3, 92–101. 10.1109/MSP.2020.2975210Search in Google Scholar

[64] A. Nedić, A. Olshevsky and C. A. Uribe, Graph-theoretic analysis of belief system dynamics under logic constraints, preprint (2018), https://arxiv.org/abs/1810.02456. 10.1038/s41598-019-45076-4Search in Google Scholar PubMed PubMed Central

[65] A. Nemirovski, S. Onn and U. G. Rothblum, Accuracy certificates for computational problems with convex structure, Math. Oper. Res. 35 (2010), no. 1, 52–78. 10.1287/moor.1090.0427Search in Google Scholar

[66] Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program. 103 (2005), no. 1, 127–152. 10.1007/s10107-004-0552-5Search in Google Scholar

[67] Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (2009), no. 1, 221–259. 10.1007/s10107-007-0149-xSearch in Google Scholar

[68] Y. Nesterov, Introduction to Convex Optimization, MCCME, Moscow, 2010. Search in Google Scholar

[69] Y. Nesterov, How to make the gradients small, Optima 88 (2012), 10–11. Search in Google Scholar

[70] Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program. 140 (2013), no. 1, 125–161. 10.1007/s10107-012-0629-5Search in Google Scholar

[71] Y. Nesterov, Universal gradient methods for convex optimization problems, Math. Program. 152 (2015), no. 1–2, 381–404. 10.1007/s10107-014-0790-0Search in Google Scholar

[72] Y. Nesterov, Implementable tensor methods in unconstrained convex optimization, CORE Discussion Paper 2018/05, CORE UCL, 2018. Search in Google Scholar

[73] Y. Nesterov, Lectures on Convex Optimization, Springer Optim. Appl. 137, Springer, Cham, 2018. 10.1007/978-3-319-91578-4Search in Google Scholar

[74] Y. Nesterov and S. U. Stich, Efficiency of the accelerated coordinate descent method on structured optimization problems, SIAM J. Optim. 27 (2017), no. 1, 110–123. 10.1137/16M1060182Search in Google Scholar

[75] A. Olshevsky, I. C. Paschalidis and S. Pu, Asymptotic network independence in distributed optimization for machine learning, preprint (2019), https://arxiv.org/abs/1906.12345. Search in Google Scholar

[76] A. Olshevsky, I. C. Paschalidis and S. Pu, A non-asymptotic analysis of network independence for distributed stochastic gradient descent, preprint (2019), https://arxiv.org/abs/1906.02702. Search in Google Scholar

[77] B. T. Poljak, Iterative algorithms for singular minimization problems, Nonlinear Programming 4 (Madison 1980), Academic Press, New York (1981), 147–166. 10.1016/B978-0-12-468662-5.50011-2Search in Google Scholar

[78] B. T. Polyak, Introduction to Optimization, Transl. Ser. Math. Eng., Optimization Software, New York, 1987. Search in Google Scholar

[79] R. T. Rockafellar, Convex Analysis, Princeton Math. Ser. 28, Princeton University Press, Princeton, 1970. 10.1515/9781400873173Search in Google Scholar

[80] A. Rogozin and A. Gasnikov, Projected gradient method for decentralized optimization over time-varying networks, preprint (2019), https://arxiv.org/abs/1911.08527. Search in Google Scholar

[81] A. Rogozin and A. Gasnikov, Penalty-based method for decentralized optimization over time-varying graphs, Optimization and Applications: Proceedings of the 11th International Conference. OPTIMA 2020 (Moscow 2020), Springer, Cham (2020), 239–256. 10.1007/978-3-030-62867-3_18Search in Google Scholar

[82] A. Rogozin, V. Lukoshkin, A. Gasnikov, D. Kovalev and E. Shulgin, Towards accelerated rates for distributed optimization over time-varying networks, preprint (2020), https://arxiv.org/abs/2009.11069. 10.1007/978-3-030-91059-4_19Search in Google Scholar

[83] A. Rogozin, C. A. Uribe, A. V. Gasnikov, N. Malkovsky and A. Nedić, Optimal distributed convex optimization on slowly time-varying graphs, IEEE Trans. Control Netw. Syst. 7 (2020), no. 2, 829–841. 10.1109/TCNS.2019.2949439Search in Google Scholar

[84] K. Scaman, F. Bach, S. Bubeck, Y. T. Lee and L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, Proceedings of the 34th International Conference on Machine Learning. ICML 2017 (Sysney 2017), ICML, San Diego (2017), 3027–3036. Search in Google Scholar

[85] K. Scaman, F. Bach, S. Bubeck, L. Massoulié and Y. T. Lee, Optimal algorithms for non-smooth distributed optimization in networks, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 2745–2754. Search in Google Scholar

[86] S. Shalev-Shwartz, O. Shamir, N. Srebro and K. Sridharan, Stochastic convex optimization, Proceedings of the Conference on Learning Theory (COLT), COLT (2009), https://www.cs.mcgill.ca/~colt2009/papers/018.pdf. Search in Google Scholar

[87] S. Shalev-Shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Proceedings of the 34th International Conference on Machine Learning. ICML 2014 (Bejing 2014), ICML, San Diego (2014), (2014), 64–72. 10.1007/s10107-014-0839-0Search in Google Scholar

[88] O. Shamir and S. Shalev-Shwartz, Matrix completion with the trace norm: learning, bounding, and transducing, J. Mach. Learn. Res. 15 (2014), 3401–3423. Search in Google Scholar

[89] A. Shapiro, D. Dentcheva and A. Ruszczyński, Lectures on Stochastic Programming. Modeling and Theory, MPS/SIAM Ser. Optimiz. 9, Society for Industrial and Applied Mathematics, Philadelphia, 2009. 10.1137/1.9780898718751Search in Google Scholar

[90] V. Spokoiny, Parametric estimation. Finite sample theory, Ann. Statist. 40 (2012), no. 6, 2877–2909. 10.1214/12-AOS1054Search in Google Scholar

[91] S. Sra, Tractable optimization in machine learning, Tractability, Cambridge University Press, Cambridge (2014), 202–230. 10.1017/CBO9781139177801.008Search in Google Scholar

[92] F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C. A. Uribe, D. Pasechnyuk and S. Artamonov, Gradient methods for problems with inexact model of the objective, International Conference on Mathematical Optimization Theory and Operations Research, Lecture Notes in Comput. Sci. 11548, Springer, Cham (2019), 97–114. 10.1007/978-3-030-22629-9_8Search in Google Scholar

[93] F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh and V. Piskunova, Inexact model: A framework for optimization and variational inequalities, preprint (2019), https://arxiv.org/abs/1902.00990. 10.1080/10556788.2021.1924714Search in Google Scholar

[94] F. Stonyakin, A. Stepanov, A. Gasnikov and A. Titov, Mirror descent for constrained optimization problems with large subgradient values of functional constraints, Comput. Res. Modell. 12 (2020), no. 2, 301–317. 10.20537/2076-7633-2020-12-2-301-317Search in Google Scholar

[95] H. Sun and M. Hong, Distributed non-convex first-order optimization and information processing: Lower complexity bounds and rate optimal algorithms, IEEE Trans. Signal Process. 67 (2019), no. 22, 5912–5928. 10.1109/ACSSC.2018.8645279Search in Google Scholar

[96] J. Tang, K. Egiazarian, M. Golbabaee and M. Davies, The practicality of stochastic optimization in imaging inverse problems, IEEE Trans. Comput. Imaging 6 (2020), 1471–1485. 10.1109/TCI.2020.3032101Search in Google Scholar

[97] C. A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov and A. Nedić, Distributed computation of Wasserstein barycenters over networks, 2018 IEEE 57th Annual Conference on Decision and Control (CDC), IEEE Press, Piscataway (2018), 6544–6549. 10.1109/CDC.2018.8619160Search in Google Scholar

[98] C. R. Vogel, Computational Methods for Inverse Problems, Front. Appl. Math. 23, Society for Industrial and Applied Mathematics, Philadelphia, 2002. 10.1137/1.9780898717570Search in Google Scholar

[99] B. E. Woodworth, K. K. Patel, S. U. Stich, Z. Dai, B. Bullins, H. B. McMahan, O. Shamir and N. Srebro, Is local SGD better than minibatch SGD?, preprint (2020), https://arxiv.org/abs/2002.07839. Search in Google Scholar

[100] B. E. Woodworth, K. K. Patel and N. Srebro, Minibatch vs local SGD for heterogeneous distributed learning, preprint (2020), https://arxiv.org/abs/2006.04735. Search in Google Scholar

[101] B. E. Woodworth and N. Srebro, Tight complexity bounds for optimizing composite objectives, Advances in Neural Information Processing Systems 29 (NeurIPS 2016), Neural Information Processing Systems Foundation, San Diego (2016), 3639–3647. Search in Google Scholar

[102] B. E. Woodworth, J. Wang, A. Smith, B. McMahan and N. Srebro, Graph oracle models, lower bounds, and gaps for parallel stochastic optimization, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Neural Information Processing Systems Foundation, San Diego (2018), 8505–8515. Search in Google Scholar

[103] J. Xu, Y. Tian, Y. Sun and G. Scutari, Accelerated primal-dual algorithms for distributed smooth convex optimization over networks, preprint (2019), https://arxiv.org/abs/1910.10666. Search in Google Scholar

[104] H. Ye, L. Luo, Z. Zhou and T. Zhang, Multi-consensus decentralized accelerated gradient descent, preprint (2020), https://arxiv.org/abs/2005.00797. Search in Google Scholar

[105] N. Ye, F. Roosta-Khorasani and T. Cui, Optimization methods for inverse problems, 2017 MATRIX Annals, MATRIX Book Ser. 2, Springer, Cham (2019), 121–140. 10.1007/978-3-030-04161-8_9Search in Google Scholar

[106] H. Yuan and T. Ma, Federated accelerated stochastic gradient descent, preprint (2020), https://arxiv.org/abs/2006.08950. Search in Google Scholar

Received: 2020-11-19

Accepted: 2020-12-10

Published Online: 2021-01-22

Published in Print: 2021-06-01

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Abstract

Acknowledgements

References

Journal and Issue

Articles in the same Issue