Distributionally Robust Federated Averaging,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distributionally Robust Federated Averaging
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-02-25 , DOI: arxiv-2102.12660
Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

In this paper, we study communication efficient distributed algorithms for distributionally robust federated learning via periodic averaging with adaptive sampling. In contrast to standard empirical risk minimization, due to the minimax structure of the underlying optimization problem, a key difficulty arises from the fact that the global parameter that controls the mixture of local losses can only be updated infrequently on the global stage. To compensate for this, we propose a Distributionally Robust Federated Averaging (DRFA) algorithm that employs a novel snapshotting scheme to approximate the accumulation of history gradients of the mixing parameter. We analyze the convergence rate of DRFA in both convex-linear and nonconvex-linear settings. We also generalize the proposed idea to objectives with regularization on the mixture parameter and propose a proximal variant, dubbed as DRFA-Prox, with provable convergence rates. We also analyze an alternative optimization method for regularized cases in strongly-convex-strongly-concave and non-convex (under PL condition)-strongly-concave settings. To the best of our knowledge, this paper is the first to solve distributionally robust federated learning with reduced communication, and to analyze the efficiency of local descent methods on distributed minimax problems. We give corroborating experimental evidence for our theoretical results in federated learning settings.

中文翻译：

分布稳健的联合平均

在本文中，我们研究了通过周期性平均和自适应采样来实现分布式鲁棒联邦学习的高效通信分布式算法。与标准的经验风险最小化相反，由于潜在优化问题的极大极小结构，一个关键困难来自以下事实：控制局部损失混合的全局参数只能在全局阶段不经常更新。为了弥补这一点，我们提出了一种分布式健壮的联邦平均（DRFA）算法，该算法采用一种新颖的快照方案来近似混合参数的历史梯度的累积。我们分析了DRFA在凸线性和非凸线性设置中的收敛速度。我们还通过对混合参数进行正则化将提出的想法推广到目标，并提出了一种称为DRFA-Prox的近端变体，具有可证明的收敛速度。我们还分析了强凸-强凹和非凸（在PL条件下）-强凹设置中的正则化情况的替代优化方法。据我们所知，本文是第一个以减少的沟通解决分布式健壮的联合学习，并分析了局部后裔方法在分布式最小极大问题上的效率的方法。对于联合学习环境中的理论结果，我们提供了确凿的实验证据。我们还分析了强凸-强凹和非凸（在PL条件下）-强凹设置中的正则化情况的替代优化方法。据我们所知，本文是第一个以减少的沟通解决分布式健壮的联合学习，并分析了局部后裔方法在分布式最小极大问题上的效率的方法。对于联合学习环境中的理论结果，我们提供了确凿的实验证据。我们还分析了强凸-强凹和非凸（在PL条件下）-强凹设置中的正则化情况的替代优化方法。据我们所知，本文是第一个以减少的沟通解决分布式健壮的联合学习，并分析了局部后裔方法在分布式最小极大问题上的效率的方法。对于联合学习环境中的理论结果，我们提供了确凿的实验证据。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文