Self-balancing Federated Learning with Global Imbalanced Data in Mobile Systems,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-balancing Federated Learning with Global Imbalanced Data in Mobile Systems
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-01-01 , DOI: 10.1109/tpds.2020.3009406
Moming Duan , Duo Liu , Xianzhang Chen , Renping Liu , Yujuan Tan , Liang Liang

Federated learning (FL) is a distributed deep learning method that enables multiple participants, such as mobile and IoT devices, to contribute a neural network while their private training data remains in local devices. This distributed approach is promising in the mobile systems where have a large corpus of decentralized data and require high privacy. However, unlike the common datasets, the data distribution of the mobile systems is imbalanced which will increase the bias of model. In this article, we demonstrate that the imbalanced distributed training data will cause an accuracy degradation of FL applications. To counter this problem, we build a self-balancing FL framework named Astraea, which alleviates the imbalances by 1) Z-score-based data augmentation, and 2) Mediator-based multi-client rescheduling. The proposed framework relieves global imbalance by adaptive data augmentation and downsampling, and for averaging the local imbalance, it creates the mediator to reschedule the training of clients based on Kullback–Leibler divergence (KLD) of their data distribution. Compared with FedAvg, the vanilla FL algorithm, Astraea shows +4.39 and +6.51 percent improvement of top-1 accuracy on the imbalanced EMNIST and imbalanced CINIC-10 datasets, respectively. Meanwhile, the communication traffic of Astraea is reduced by 75 percent compared to FedAvg.

中文翻译：

移动系统中具有全局不平衡数据的自平衡联邦学习

联邦学习 (FL) 是一种分布式深度学习方法，它使多个参与者（例如移动和物联网设备）能够贡献神经网络，而他们的私人训练数据保留在本地设备中。这种分布式方法在具有大量分散数据和需要高度隐私的移动系统中很有前景。然而，与普通数据集不同的是，移动系统的数据分布是不平衡的，这会增加模型的偏差。在本文中，我们证明了不平衡的分布式训练数据会导致 FL 应用程序的准确性下降。为了解决这个问题，我们构建了一个名为 Astraea 的自平衡 FL 框架，它通过 1）基于 Z-score 的数据增强和 2）基于中介器的多客户端重新调度来缓解不平衡。所提出的框架通过自适应数据增强和下采样来缓解全局不平衡，并且为了平均局部不平衡，它创建了中介，以根据其数据分布的 Kullback-Leibler 散度（KLD）重新安排客户的培训。与普通 FL 算法 FedAvg 相比，Astraea 在不平衡的 EMNIST 和不平衡的 CINIC-10 数据集上的 top-1 准确率分别提高了 +4.39% 和 +6.51%。同时，Astraea 的通信流量比 FedAvg 减少了 75%。在不平衡的 EMNIST 和不平衡的 CINIC-10 数据集上，top-1 的准确率分别提高了 51%。同时，Astraea 的通信流量比 FedAvg 减少了 75%。在不平衡的 EMNIST 和不平衡的 CINIC-10 数据集上，top-1 的准确率分别提高了 51%。同时，Astraea 的通信流量比 FedAvg 减少了 75%。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11