当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-11-20 , DOI: arxiv-2011.10643
Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having $O(\log\frac{1}{\epsilon})$ gradient iterations {with constant step size} - and $O(\log\frac{1}{\epsilon})$ gossip steps between every pair of these iterations - enables convergence to within $\epsilon$ of the optimal value for smooth non-convex objectives satisfying Polyak-\L{}ojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression.

中文翻译:

在通信受限的分散式优化中使用多个八卦步骤的好处

在分散式优化中,通常的算法实践是使节点与八卦(即,通过网络求平均)步骤交错(局部)梯度下降迭代。受大规模机器学习模型训练的激励,要求消息是局部参数的{\ em有损压缩}版本也越来越普遍。在本文中,我们表明,在这种压缩的分散式优化设置中,即使在适当地考虑了这样做的成本(例如通过降低精度)的情况下,在后续的梯度迭代之间使用{\ em multiple}个八卦步骤也是有好处的压缩信息。尤其是,我们证明了$ O(\ log \ frac {1} {\ epsilon})$个梯度迭代(步长不变)-和$ O(\ log \ frac {1} {\ epsilon})$个八卦步之间这些迭代的对-使得收敛到最优值的\ epsilon $以内满足满足Polyak- \ L {} ojasiewicz条件的平滑非凸物镜。这个结果也适用于光滑的强凸物镜。据我们所知,这是在任意通信压缩下获得非凸优化收敛结果的第一项工作。
更新日期:2020-11-25
down
wechat
bug