Abstract
Distributed optimization has been well developed in recent years due to its wide applications in machine learning and signal processing. In this paper, we focus on investigating distributed optimization to minimize a global objective. The objective is a sum of smooth and strongly convex local cost functions which are distributed over an undirected network of n nodes. In contrast to existing works, we apply a distributed heavy-ball term to improve the convergence performance of the proposed algorithm. To accelerate the convergence of existing distributed stochastic first-order gradient methods, a momentum term is combined with a gradient-tracking technique. It is shown that the proposed algorithm has better acceleration ability than GT-SAGA without increasing the complexity. Extensive experiments on real-world datasets verify the effectiveness and correctness of the proposed algorithm.
摘要
由于在机器学习和信号处理中的广泛应用, 近年来分布式优化得到良好发展. 本文致力于研究分布式优化以求解目标函数全局最小值. 该目标是分布在个节点的无向网络上的平滑且强凸的局部成本函数总和. 与已有工作不同的是, 我们使用分布式重球项以提高算法的收敛性能. 为使现有分布式随机一阶梯度算法的收敛加速, 将动量项与梯度跟踪技术结合. 仿真结果表明, 在不增加复杂度的情况下, 所提算法具有比GT-SAGA更高收敛速率. 在真实数据集上的数值实验证明了该算法的有效性和正确性.
Similar content being viewed by others
References
Bertsekas D, Gafni E, 1983. Projected Newton methods and optimization of multicommodity flows. IEEE Trans Autom Contr, 28(12):1090–1096. https://doi.org/10.1109/TAC.1983.1103183
Boyd S, Parikh N, Chu E, et al., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn, 3(1):1–122. https://doi.org/10.1561/2200000016
Cheng B, Li ZK, 2019. Coordinated tracking control with asynchronous edge-based event-triggered communications. IEEE Trans Autom Contr, 64(10):4321–4328. https://doi.org/10.1109/TAC.2019.2895927
Cheng S, Chen MY, Wai RJ, et al., 2014. Optimal placement of distributed generation units in distribution systems via an enhanced multi-objective particle swarm optimization algorithm. J Zhejiang Univ-Sci C (Comput & Electron), 15(4):300–311. https://doi.org/10.1631/jzus.C1300250
Cohen K, Nedić A, Srikant R, 2017. Distributed learning algorithms for spectrum sharing in spatial random access wireless networks. IEEE Trans Autom Contr, 62(6):2854–2869. https://doi.org/10.1109/TAC.2016.2626578
Defazio A, Bach F, Lacoste-Julien S, 2014. SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Proc 27th Int Conf on Neural Information Processing Systems, p.1646–1654.
Dua D, Graff C, 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Duchi JC, Agarwal A, Wainwright MJ, 2012. Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans Autom Contr, 57(3):592–606. https://doi.org/10.1109/TAC.2011.2161027
Eisen M, Mokhtari A, Ribeiro A, 2017. Decentralized quasi-Newton methods. IEEE Trans Signal Process, 65(10):2613–2628. https://doi.org/10.1109/TSP.2017.2666776
Erseghe T, Zennaro D, Dall’Anese E, et al., 2011. Fast consensus by the alternating direction multipliers method. IEEE Trans Signal Process, 59(11):5523–5537. https://doi.org/10.1109/TSP.2011.2162831
Guan L, Sun T, Qiao LB, et al., 2020. An efficient parallel and distributed solution to nonconvex penalized linear SVMs. Front Inform Technol Electron Eng, 21(4):587–603. https://doi.org/10.1631/FITEE.1800566
Han ZM, Lin ZY, Fu MY, et al., 2015. Distributed coordination in multi-agent systems: a graph Laplacian perspective. Front Inform Technol Electron Eng, 16(6):429–448. https://doi.org/10.1631/FITEE.1500118
Hu JH, Yan Y, Li HQ, et al., 2021. Convergence of an accelerated distributed optimisation algorithm over time-varying directed networks. IET Contr Theory Appl, 15(1):24–39. https://doi.org/10.1049/cth2.12022
Johnson R, Zhang T, 2013. Accelerating stochastic gradient descent using predictive variance reduction. Proc 26th Int Conf on Neural Information Processing Systems, p.315–323.
Lan Q, Qiao LB, Wang YJ, 2018. Stochastic extra-gradient based alternating direction methods for graph-guided regularized minimization. Front Inform Technol Electron Eng, 19(6):755–762. https://doi.org/10.1631/FITEE.1601771
Li HQ, Cheng HQ, Wang Z, et al., 2020. Distributed Nesterov gradient and heavy-ball double accelerated asynchronous optimization. IEEE Trans Neur Netw Learn Syst, in press. https://doi.org/10.1109/TNNLS.2020.3027381
Li Z, Shi W, Yan M, 2019. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Trans Signal Process, 67(17):4494–4506. https://doi.org/10.1109/TSP.2019.2926022
Ling Q, Ribeiro A, 2014. Decentralized dynamic optimization through the alternating direction method of multipliers. IEEE Trans Signal Process, 62(5):1185–1197. https://doi.org/10.1109/TSP.2013.2295055
Ling Q, Tian Z, 2010. Decentralized sparse signal recovery for compressive sleeping wireless sensor networks. IEEE Trans Signal Process, 58(7):3816–3827. https://doi.org/10.1109/TSP.2010.2047721
Liu R, Sun WC, Hou T, et al., 2019. Block coordinate descentwith time perturbation for nonconvex nonsmooth problems in real-world studies. Front Inform Technol Electron Eng, 20(10):1390–1403. https://doi.org/10.1631/FITEE.1900341
Lü QG, Liao XF, Li HQ, et al., 2020. A Nesterov-like gradient tracking algorithm for distributed optimization over directed networks. IEEE Trans Syst Man Cybern, in press. https://doi.org/10.1109/TSMC.2019.2960770
Mateos G, Bazerque JA, Giannakis GB, 2010. Distributed sparse linear regression. IEEE Trans Signal Process, 58(10):5262–5276. https://doi.org/10.1109/TSP.2010.2055862
Matthews TP, Wang K, Li CP, et al., 2016. Nonlinear waveform inversion by use of the regularized dual averaging method for ultrasound computed tomography. Progress in Electromagnetic Research Symp, p.3948. https://doi.org/10.1109/PIERS.2016.7735487
McMahan B, Moore E, Ramage D, et al., 2017. Communication-efficient learning of deep networks from decentralized data. Proc 20th Int Conf on Artificial Intelligence and Statistics, p.1273–1282.
Nedic A, Ozdaglar A, 2009. Distributed subgradient methods for multi-agent optimization. IEEE Trans Autom Contr, 54(1):48–61. https://doi.org/10.1109/TAC.2008.2009515
Nedic A, Olshevsky A, Shi W, 2017a. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J Optim, 27(4):2597–2633. https://doi.org/10.1137/16M1084316
Nedic A, Olshevsky A, Shi W, et al., 2017b. Geometrically convergent distributed optimization with uncoordinated step-sizes. American Control Conf, p.3950–3955. https://doi.org/10.23919/ACC.2017.7963560
Schmidt M, Le Roux N, Bach F, 2017. Minimizing finite sums with the stochastic average gradient. Math Program, 162(1–2):83–112. https://doi.org/10.1007/s10107-016-1030-6
Tan CH, Ma SQ, Dai YH, et al., 2016. Barzilai-Borwein step size for stochastic gradient descent. Proc 30th Int Conf on Neural Information Processing Systems, p.685–693.
Tsitsiklis J, Bertsekas D, Athans M, 1986. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans Autom Contr, 31(9):803–812. https://doi.org/10.1109/TAC.1986.1104412
Wang B, Jiang HY, Fang J, et al., 2018. A proximal ADMM for decentralized composite optimization. IEEE Signal Process Lett, 25(8):1121–1125. https://doi.org/10.1109/LSP.2018.2841648
Wang Z, Li HQ, 2020. Edge-based stochastic gradient algorithm for distributed optimization. IEEE Trans Netw Sci Eng, 7(3):1421–1430. https://doi.org/10.1109/TNSE.2019.2933177
Wei EM, Ozdaglar A, Jadbabaie A, 2013. A distributed Newton method for network utility maximization—I: algorithm. IEEE Trans Autom Contr, 58(9):2162–2175. https://doi.org/10.1109/TAC.2013.2253218
Xi CG, Khan UA, 2017. DEXTRA: a fast algorithm for optimization over directed graphs. IEEE Trans Autom Contr, 62(10):4980–4993. https://doi.org/10.1109/TAC.2017.2672698
Xia YS, Wang J, 2004. A one-layer recurrent neural network for support vector machine learning. IEEE Trans Syst Man Cybern B, 34(2):1261–1269. https://doi.org/10.1109/TSMCB.2003.822955
Xin R, Khan UA, 2018. A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Contr Syst Lett, 2(3):315–320. https://doi.org/10.1109/LCSYS.2018.2834316
Xin R, Khan UA, 2020. Distributed heavy-ball: a generalization and acceleration of first-order methods with gradient tracking. IEEE Trans Autom Contr, 65(6):2627–2633. https://doi.org/10.1109/TAC.2019.2942513
Xin R, Jakovetić D, Khan UA, 2019a. Distributed Nesterov gradient methods over arbitrary graphs. IEEE Signal Process Lett, 26(8):1247–1251. https://doi.org/10.1109/LSP.2019.2925537
Xin R, Sahu AK, Khan UA, et al., 2019b. Distributed stochastic optimization with gradient tracking over strongly-connected networks. Proc IEEE 58th Conf on Decision and Control, p.8353–8358. https://doi.org/10.1109/CDC40024.2019.9029217
Xin R, Xi CG, Khan UA, 2019c. FROST—fast row-stochastic optimization with uncoordinated step-sizes. EURASIP J Adv Signal Process, 2019(1):1. https://doi.org/10.1186/s13634-018-0596-y
Xin R, Khan UA, Kar S, 2020. Variance-reduced decentralized stochastic optimization with accelerated convergence. IEEE Trans Signal Process, 68:6255–6271. https://doi.org/10.1109/TSP.2020.3031071
Xu JM, Zhu SY, Soh YC, et al., 2015. Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes. Proc 54th IEEE Conf on Decision and Control, p.2055–2060. https://doi.org/10.1109/CDC.2015.7402509
Yin R, Zhang Y, Yu GD, et al., 2010. Centralized and distributed resource allocation in OFDM based multi-relay system. J Zhejiang Univ-Sci C (Comput & Electron), 11(6):450–464. https://doi.org/10.1631/jzus.C0910405
Yuan DM, Ma Q, Wang Z, 2013. Distributed dual averaging method for solving saddle-point problems over multi-agent networks. Proc 32nd Chinese Control Conf, p.6868–6872.
Zhang CL, Ahmad M, Wang YQ, 2019. ADMM based privacy-preserving decentralized optimization. IEEE Trans Inform Forens Secur, 14(3):565–580. https://doi.org/10.1109/TIFS.2018.2855169
Zinkevich MA, Weimer M, Smola A, et al., 2010. Parallelized stochastic gradient descent. Proc 23rd Int Conf on Neural Information Processing Systems, p.2595–2603.
Author information
Authors and Affiliations
Contributions
Bihao SUN designed the research, processed the data, and drafted the manuscript. Jinhui HU, Dawen XIA, and Huaqing LI helped organize the manuscript and process the data. Bihao SUN and Huaqing LI revised and finalized the paper.
Corresponding author
Ethics declarations
Bihao SUN, Jinhui HU, Dawen XIA, and Huaqing LI declare that they have no conflict of interest.
Additional information
Project supported by the Open Research Fund Program of Data Recovery Key Laboratory of Sichuan Province, China (No. DRN2001), the National Natural Science Foundation of China (Nos. 61773321 and 61762020), the Science and Technology Top-Notch Talents Support Project of Colleges and Universities in Guizhou Province, China (No. QJHKY2016065), the Science and Technology Foundation of Guizhou Province, China (No. QKHJC20181083), and the Science and Technology Talents Fund for Excellent Young of Guizhou Province, China (No. QKHPTRC20195669)
Huaqing LI received his BS degree in information and computing science in 2009 from Chongqing University of Posts and Telecommunications, Chongqing, China and his PhD degree in computer science and technology in 2013 from Chongqing University. From Sept. 2014 to Sept. 2015, he was a postdoctoral researcher at the School of Electrical and Information Engineering, The University of Sydney, Australia. From Nov. 2015 to Nov. 2016, he was a postdoctoral researcher at the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. He is currently a professor at the College of Electronic and Information Engineering, Southwest University, Chongqing, China. His main research interests include nonlinear dynamics and control, multi-agent system, and distributed optimization. Prof. LI currently serves as a regional editor for Neur Comput Appl, an editorial board member for IEEE Access, and a corresponding expert for Front Inform Technol Electron Eng.
Rights and permissions
About this article
Cite this article
Sun, B., Hu, J., Xia, D. et al. A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration. Front Inform Technol Electron Eng 22, 1463–1476 (2021). https://doi.org/10.1631/FITEE.2000615
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2000615
Key words
- Distributed optimization
- High-performance algorithm
- Multi-agent system
- Machine-learning problem
- Stochastic gradient