Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks,IEEE Transactions on Signal Processing

当前位置： X-MOL 学术 › IEEE Trans. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2020-07-31 , DOI: 10.1109/tsp.2020.3012952
Zhaoxian Wu , Qing Ling , Tianyi Chen , Georgios B. Giannakis

This paper deals with distributed finite-sum optimization for learning over multiple workers in the presence of malicious Byzantine attacks. Most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise challenges discerning malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the `honest' workers. This motivates reducing the variance of stochastic gradients as a means of robustifying SGD. To this end, a novel Byzantine attack resilient distributed (Byrd-) SAGA approach is introduced for federated learning tasks involving multiple workers. Rather than the mean employed by distributed SAGA, the novel Byrd-SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, Byrd-SAGA attains provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd-SAGA over Byzantine attack resilient distributed SGD.

中文翻译：

对拜占庭攻击具有鲁棒性的联合方差减少随机梯度下降

本文讨论了分布式有限和优化，用于在存在恶意拜占庭攻击的情况下对多个工作人员进行学习。迄今为止，大多数弹性方法将随机梯度下降（SGD）与不同的稳健聚合规则相结合。然而，SGD 引起的相当大的随机梯度噪声挑战着从“诚实”工作人员发送的噪声随机梯度中辨别拜占庭攻击者发送的恶意消息。这促使减少随机梯度的方差作为增强 SGD 的一种手段。为此，针对涉及多个工作人员的联邦学习任务引入了一种新颖的拜占庭攻击弹性分布式（Byrd-）SAGA 方法。新颖的 Byrd-SAGA 不是采用分布式 SAGA 所采用的均值，而是依赖于几何中位数来聚合工作人员发送的校正后的随机梯度。当不到一半的工人是拜占庭攻击者时，Byrd-SAGA 可以证明线性收敛到最优解的邻域，渐近学习误差由拜占庭工人的数量决定。数值测试证实了 Byrd-SAGA 对各种拜占庭攻击的鲁棒性，以及 Byrd-SAGA 相对于拜占庭攻击弹性分布式 SGD 的优点。

更新日期：2020-07-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11