Decentralized Local Stochastic Extra-Gradient for Variational Inequalities,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-06-15 , DOI: arxiv-2106.08315
Aleksandr Beznosikov, Pavel Dvurechensky, Anastasia Koloskova, Valentin Samokhin, Sebastian U Stich, Alexander Gasnikov

We consider decentralized stochastic variational inequalities where the problem data is distributed across many participating devices (heterogeneous, or non-IID data setting). We propose a novel method - based on stochastic extra-gradient - where participating devices can communicate over arbitrary, possibly time-varying network topologies. This covers both the fully decentralized optimization setting and the centralized topologies commonly used in Federated Learning. Our method further supports multiple local updates on the workers for reducing the communication frequency between workers. We theoretically analyze the proposed scheme in the strongly monotone, monotone and non-monotone setting. As a special case, our method and analysis apply in particular to decentralized stochastic min-max problems which are being studied with increased interest in Deep Learning. For example, the training objective of Generative Adversarial Networks (GANs) are typically saddle point problems and the decentralized training of GANs has been reported to be extremely challenging. While SOTA techniques rely on either repeated gossip rounds or proximal updates, we alleviate both of these requirements. Experimental results for decentralized GAN demonstrate the effectiveness of our proposed algorithm.

中文翻译：

变分不等式的分散局部随机超梯度

我们考虑分散的随机变分不等式，其中问题数据分布在许多参与设备上（异构或非 IID 数据设置）。我们提出了一种基于随机梯度的新方法，其中参与的设备可以通过任意的、可能随时间变化的网络拓扑进行通信。这涵盖了完全分散的优化设置和联邦学习中常用的集中式拓扑。我们的方法进一步支持工作人员的多个本地更新，以减少工作人员之间的通信频率。我们从理论上分析了在强单调、单调和非单调设置中提出的方案。作为特例，我们的方法和分析特别适用于对深度学习越来越感兴趣的分散随机最小-最大问题。例如，生成对抗网络 (GAN) 的训练目标通常是鞍点问题，据报道，GAN 的分散训练极具挑战性。虽然 SOTA 技术依赖于重复的八卦回合或近端更新，但我们减轻了这两个要求。去中心化 GAN 的实验结果证明了我们提出的算法的有效性。虽然 SOTA 技术依赖于重复的八卦回合或近端更新，但我们减轻了这两个要求。去中心化 GAN 的实验结果证明了我们提出的算法的有效性。虽然 SOTA 技术依赖于重复的八卦回合或近端更新，但我们减轻了这两个要求。去中心化 GAN 的实验结果证明了我们提出的算法的有效性。

更新日期：2021-06-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>