当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tsp.2020.3018317
Huan Li , Cong Fang , Wotao Yin , Zhouchen Lin

In this article, we study the communication, and (sub)gradient computation costs in distributed optimization. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization, and it obtains the near optimal $O(\sqrt{\frac{L}{\epsilon (1-\sigma _2(W))}}\log \frac{1}{\epsilon })$ communication complexity, and the optimal $O(\sqrt{\frac{L}{\epsilon }})$ gradient computation complexity for $L$-smooth convex problems, where $\sigma _2(W)$ denotes the second largest singular value of the weight matrix $W$ associated to the network, and $\epsilon$ is the target accuracy. When the problem is $\mu$-strongly convex, and $L$-smooth, our algorithm has the near optimal $O(\sqrt{\frac{L}{\mu (1-\sigma _2(W))}}\log ^2\frac{1}{\epsilon })$ complexity for communications, and the optimal $O(\sqrt{\frac{L}{\mu }}\log \frac{1}{\epsilon })$ complexity for gradient computations. Our communication complexities are only worse by a factor of $(\log \frac{1}{\epsilon })$ than the lower bounds. Our second algorithm is designed for nonsmooth distributed optimization, and it achieves both the optimal $O(\frac{1}{\epsilon \sqrt{1-\sigma _2(W)}})$ communication complexity, and $O(\frac{1}{\epsilon ^2})$ subgradient computation complexity, which match the lower bounds for nonsmooth distributed optimization.

中文翻译:

具有增加惩罚参数的分散加速梯度方法

在本文中,我们研究了分布式优化中的通信和(子)梯度计算成本。我们提出了两种基于加速惩罚方法框架的算法,增加了惩罚参数。我们的第一个算法是平滑分布式优化,它获得了接近最优的$O(\sqrt{\frac{L}{\epsilon (1-\sigma _2(W))}}\log \frac{1}{\epsilon })$ 通信复杂度和最优 $O(\sqrt{\frac{L}{\epsilon }})$ 梯度计算复杂度为 $L$-光滑凸问题,其中 $\sigma_2(W)$ 表示权重矩阵的第二大奇异值 $W$ 与网络相关联,以及 $\epsilon$是目标精度。当问题是$\mu$- 强凸,和 $L$-smooth,我们的算法接近最优 $O(\sqrt{\frac{L}{\mu (1-\sigma _2(W))}}\log ^2\frac{1}{\epsilon })$ 通信的复杂性和最优 $O(\sqrt{\frac{L}{\mu }}\log \frac{1}{\epsilon })$梯度计算的复杂性。我们的沟通复杂性只会因以下因素而变得更糟$(\log \frac{1}{\epsilon })$比下限。我们的第二个算法是为非光滑分布式优化设计的,它实现了两个最优$O(\frac{1}{\epsilon \sqrt{1-\sigma _2(W)}})$ 通信复杂性,以及 $O(\frac{1}{\epsilon ^2})$ 次梯度计算复杂度,匹配非平滑分布式优化的下界。
更新日期:2020-01-01
down
wechat
bug