Communication-efficient algorithms for decentralized and stochastic optimization

Lan, Guanghui; Lee, Soomin; Zhou, Yi

doi:10.1007/s10107-018-1355-4

Communication-efficient algorithms for decentralized and stochastic optimization

Full Length Paper
Series A
Published: 07 December 2018

Volume 180, pages 237–284, (2020)
Cite this article

Mathematical Programming Submit manuscript

Guanghui Lan¹,
Soomin Lee¹ &
Yi Zhou¹

2893 Accesses
61 Citations
Explore all metrics

Abstract

We present a new class of decentralized first-order methods for nonsmooth and stochastic optimization problems defined over multiagent networks. Considering that communication is a major bottleneck in decentralized optimization, our main goal in this paper is to develop algorithmic frameworks which can significantly reduce the number of inter-node communications. Our major contribution is to present a new class of decentralized primal–dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions. By employing DCS, agents can find an \(\epsilon \)-solution both in terms of functional optimality gap and feasibility residual in \({{\mathcal {O}}}(1/\epsilon )\) (resp., \({{\mathcal {O}}}(1/\sqrt{\epsilon })\)) communication rounds for general convex functions (resp., strongly convex functions), while maintaining the \({{\mathcal {O}}}(1/\epsilon ^2)\) (resp., \(\mathcal{O}(1/\epsilon )\)) bound on the total number of intra-node subgradient evaluations. We also present a stochastic counterpart for these algorithms, denoted by SDCS, for solving stochastic optimization problems whose objective function cannot be evaluated exactly. In comparison with existing results for decentralized nonsmooth and stochastic optimization, we can reduce the total number of inter-node communication rounds by orders of magnitude while still maintaining the optimal complexity bounds on intra-node stochastic subgradient evaluations. The bounds on the (stochastic) subgradient evaluations are actually comparable to those required for centralized nonsmooth and stochastic optimization under certain conditions on the target accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-consensus decentralized primal-dual fixed point algorithm for distributed learning

Article 08 April 2024

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Article 11 April 2024

A class of accelerated GADMM-based method for multi-block nonconvex optimization problems

Article 13 April 2024

Notes

We can define the norm associated with \({{\mathbf {X}}}\) in a more general way, e.g., \( \Vert {\mathbf {x}}\Vert ^2:={\sum }_{i=1}^m p_i\Vert x_i\Vert _{X_i}^2, \ \forall {\mathbf {x}}=(x_1,\ldots ,x_m) \in {{\mathbf {X}}}, \) for some \(p_i > 0\), \(i = 1, \ldots ,m\). Accordingly, the prox-function \({\mathbf {V}}(\cdot ,\cdot )\) can be defined as \( {\mathbf {V}}({\mathbf {x}},{\mathbf {u}}):={\sum }_{i=1}^m p_iV_i(x_i,u_i), \ \forall {\mathbf {x}},{\mathbf {u}}\in {{\mathbf {X}}}. \) This setting gives us flexibility to choose \(p_i\)’s based on the information of individual \(X_i\)’s, and the possibility to further refine the convergence results.
Observed that we only need condition (3.32) when \(\mu >0\), in other words, the objective functions \(f_i\)’s are strongly convex.
We added the subscript i to emphasize that this inequality holds for any agent \(i \in {\mathcal {N}}\) with \(\phi = f_i\). More specifically, \(\varPhi _i(u_i) := \langle w_i, u_i\rangle + f_i(u_i) + \eta V_i(x_i,u_i)\).
We added the superscript k in \(\delta _i^{t-1,k}\) to emphasize that this error is generated at the k-th outer loop.
We implemented the Erhos-Renyi algorithm based on a MATLAB function written by Pablo Blider, which can be found in https://www.mathworks.com/matlabcentral/fileexchange/4206.
This real dataset can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
We choose the average of iterates obtained by the first agent as the output solution for distributed dual averaging method in all three problems.

References

Arrow, K., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford Mathematical Studies in the Social Sciences. Stanford University Press, Stanford (1958)
MATH Google Scholar
Aybat, N.S., Hamedani, E.Y.: A primal-dual method for conic constrained distributed optimization problems. In: Advances in Neural Information Processing Systems, pp. 5049–5057 (2016)
Aybat, N.S., Wang, Z., Lin, T., Ma, S.: Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Trans. Autom. Control 63(1), 5–20 (2018)
MathSciNet MATH Google Scholar
Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. 129, 163–195 (2011)
MathSciNet MATH Google Scholar
Bertsekas, D.P.: Incremental aggregated proximal and augmented lagrangian algorithms. Technical Report LIDS-P-3176, Laboratory for Information and Decision Systems (2015)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Bradley, Paul S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: ICML, vol. 98, pp. 82–90 (1998)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
MathSciNet MATH Google Scholar
Chang, T., Hong, M.: Stochastic proximal gradient consensus over random networks. arxiv:1511.08905 (2015)
Chang, T., Hong, M., Wang, X.: Multi-agent distributed optimization via inexact consensus admm. arxiv:1402.6065 (2014)
Chen, A., Ozdaglar, A.: A fast distributed proximal gradient method. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 601–608 (2012)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
MathSciNet MATH Google Scholar
Dang, C., Lan, G.: Randomized first-order methods for saddle point optimization. Technical Report 32611, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL (2015)
Deng, Q., Lan, G., Rangarajan, A.: Randomized block subgradient methods for convex nonsmooth and stochastic optimization. arXiv preprint arXiv:1509.04609 (2015)
Duchi, J., Agarwal, A., Wainwright, M.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012)
MathSciNet MATH Google Scholar
Durham, J.W., Franchi, A., Bullo, F.: Distributed pursuit-evasion without mapping or global localization via local frontiers. Auton. Robots 32(1), 81–95 (2012)
Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)
MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)
MathSciNet MATH Google Scholar
Gurbuzbalaban, M., Ozdaglar, A., Parrilo, P.: On the convergence rate of incremental aggregated gradient algorithms. arxiv:1506.02081 (2015)
He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
MathSciNet MATH Google Scholar
He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. J. Comput. Optim. Appl. 103, 127–152 (2015)
MathSciNet MATH Google Scholar
Hom, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge UP, New York (1991)
Google Scholar
Jadbabaie, A., Lin, J., Morse, A.S.: Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans. Autom. Control 48(6), 988–1001 (2003)
MathSciNet MATH Google Scholar
Jakovetic, D., Xavier, J., Moura, J.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1145 (2014)
MathSciNet MATH Google Scholar
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)
MathSciNet MATH Google Scholar
Lan, G.: Gradient sliding for composite optimization. Math. Program. 159(1), 201–235 (2016)
MathSciNet MATH Google Scholar
Lan, G., Nemirovski, A., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134(2), 425–458 (2012)
MathSciNet MATH Google Scholar
Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. arxiv:1507.02000 (2015)
Luenberger, D.G., Ye, Y., et al.: Linear and Nonlinear Programming, vol. 2. Springer, Berlin (1984)
MATH Google Scholar
Makhdoumi, A., Ozdaglar, A.: Convergence rate of distributed admm over networks. arxiv:1601.00194 (2016)
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: Dqm: Decentralized quadratically approximated alternating direction method of multipliers. arxiv:1508.02073 (2015)
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A decentralized second-order method with exact linear convergence rate for consensus optimization. arxiv:1602.00596 (2016)
Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)
MathSciNet MATH Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: Complexity of variants of Tseng’s modified F-B splitting and korpelevich’s methods for hemivariational inequalities with applications to saddle-point and convex optimization problems. SIAM J. Optim. 21(4), 1688–1720 (2011)
MathSciNet MATH Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
MathSciNet MATH Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal projection method for the iterates and the ergodic mean. SIAM J. Optim. 20, 2755–2787 (2010)
MathSciNet MATH Google Scholar
Nedić, A.: Asynchronous broadcast-based convex optimization over a network. IEEE Trans. Autom. Control 56(6), 1337–1351 (2011)
MathSciNet MATH Google Scholar
Nedić, A., Bertsekas, D.P., Borkar, V.S.: Distributed asynchronous incremental subgradient methods. Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, pp. 311–407 (2001)
Google Scholar
Nedić, A., Olshevsky, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015)
MathSciNet MATH Google Scholar
Nedić, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. arxiv:1607.03218 (2016)
Nedić, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)
MathSciNet MATH Google Scholar
Nemirovski, A.S.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)
MathSciNet MATH Google Scholar
Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)
MathSciNet MATH Google Scholar
Nemirovski, A.S., Yudin, D.: Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. Wiley, XV (1983)
Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 61(2), 275–319 (2015)
MathSciNet Google Scholar
Ouyang, Y., Chen, Y., Lan, G., Pasiliao Jr., E.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
MathSciNet MATH Google Scholar
Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization. arxiv:1605.07112 (2016)
Rabbat, M.: Multi-agent mirror descent for decentralized stochastic optimization. In: 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 517–520 (2015)
Rabbat, M., Nowak, R.D.: Distributed optimization in sensor networks. In: IPSN, pp. 20–27 (2004)
Ram, S.S., Nedić, A., Veeravalli, V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20(2), 691–717 (2009)
MathSciNet MATH Google Scholar
Ram, S.S., Nedić, A., Veeravalli, V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147, 516–545 (2010)
MathSciNet MATH Google Scholar
Ram, S.S., Veeravalli, V.V., Nedić, A.: Distributed non-autonomous power control through distributed convex optimization. In: IEEE INFOCOM, pp. 3001–3005 (2009)
Shi, W., Ling, Q., Wu, G., Yin, W.: On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Sig. Process. 62(7), 1750–1761 (2014)
MathSciNet MATH Google Scholar
Shi, W., Ling, Q., Wu, G., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)
MathSciNet MATH Google Scholar
Shi, W., Ling, Q., Wu, G., Yin, W.: A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Sig. Process. 63(22), 6013–6023 (2015)
MathSciNet MATH Google Scholar
Simonetto, A., Kester, L., Leus, G.: Distributed time-varying stochastic optimization and utility-based communication. arxiv:1408.5294 (2014)
Terelius, H., Topcu, U., Murray, R.: Decentralized multi-agent optimization via dual decomposition. IFAC Proc. Vol. 44(1), 11245–11251 (2011)
Google Scholar
Tsianos, K., Lawlor, S., Rabbat, M.: Consensus-based distributed optimization: practical issues and applications in large-scale machine learning. In: Proceedings of the 50th Allerton Conference on Communication, Control, and Computing (2012)
Tsianos, K., Rabbat, M.: Consensus-based distributed online prediction and optimization. In: 2013 IEEE Global Conference on Signal and Information Processing, pp. 807–810 (2013)
Tsitsiklis, J., Bertsekas, D., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)
MathSciNet MATH Google Scholar
Tsitsiklis, J.N.: Problems in Decentralized Decision Making and Computation. Ph.D. thesis, Massachusetts Inst. Technol., Cambridge, MA (1984)
Wang, M., Bertsekas, D.P.: Incremental constraint projection-proximal methods for nonsmooth convex optimization. Technical Report LIDS-P-2907, Laboratory for Information and Decision Systems (2013)
Wei, E., Ozdaglar, A.: On the \({O}(1/k)\) convergence of asynchronous distributed alternating direction method of multipliers. arxiv:1307.8254 (2013)
Xi, C., Wu, Q., Khan, U.A.: Distributed mirror descent over directed graphs. arxiv:1412.5526 (2014)
Zhu, J., Rosset, S., Tibshirani, R., Hastie, T.J.: 1-norm support vector machines. In: Advances in neural information processing systems, pp. 49–56 (2004)
Zhu, M., Martinez, S.: On distributed convex optimization under inequality and equality constraints. IEEE Trans. Autom. Control 57(1), 151–164 (2012)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Guanghui Lan, Soomin Lee & Yi Zhou

Authors

Guanghui Lan
View author publications
You can also search for this author in PubMed Google Scholar
Soomin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghui Lan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was funded by National Science Foundation Grants 1637473 and 1637474, and Office of Naval Research grant N00014-16-1-2802.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lan, G., Lee, S. & Zhou, Y. Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180, 237–284 (2020). https://doi.org/10.1007/s10107-018-1355-4

Download citation

Received: 16 January 2017
Accepted: 01 December 2018
Published: 07 December 2018
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10107-018-1355-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Communication-efficient algorithms for decentralized and stochastic optimization

Abstract

Access this article

Similar content being viewed by others

Multi-consensus decentralized primal-dual fixed point algorithm for distributed learning

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

A class of accelerated GADMM-based method for multi-block nonconvex optimization problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Communication-efficient algorithms for decentralized and stochastic optimization

Abstract

Access this article

Similar content being viewed by others

Multi-consensus decentralized primal-dual fixed point algorithm for distributed learning

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

A class of accelerated GADMM-based method for multi-block nonconvex optimization problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation