Dual-Free Stochastic Decentralized Optimization with Variance Reduction,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dual-Free Stochastic Decentralized Optimization with Variance Reduction
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-06-25 , DOI: arxiv-2006.14384
Hadrien Hendrikx, Francis Bach, Laurent Massouli\'e

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a $1/n$ fraction of the dataset, where $n$ is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data.

中文翻译：

具有方差减少的双自由随机分散优化

我们考虑以分散的方式在分布式数据上训练机器学习模型的问题。对于有限和问题，大型数据集的快速单机算法依赖于结合方差减少的随机更新。然而，现有的去中心化随机算法要么无法获得随机更新所允许的完全加速，要么需要比常规梯度更昂贵的预言机。在这项工作中，我们引入了一种称为 DVR 的具有方差减少的去中心化随机算法。DVR 只需要计算局部函数的随机梯度，并且在计算上与在数据集的 $1/n$ 部分上运行的标准随机方差减少算法一样快，其中 $n$ 是节点数。为了推导出 DVR，我们在一个精心挑选的对偶问题上使用 Bregman 坐标下降，并使用特定的 Bregman 散度获得对偶自由算法。我们给出了基于 Catalyst 框架的加速版 DVR，并通过对真实数据的模拟来说明其有效性。

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文