Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air,arXiv - CS - Information Theory

当前位置： X-MOL 学术 › arXiv.cs.IT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
arXiv - CS - Information Theory Pub Date : 2019-01-03 , DOI: arxiv-1901.00844
Mohammad Mohammadi Amiri and Deniz Gunduz

We study federated machine learning (ML) at the wireless edge, where power- and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS). Standard approaches assume separate computation and communication, where local gradient estimates are compressed and transmitted to the PS over orthogonal links. Following this digital approach, we introduce D-DSGD, in which the wireless devices employ gradient quantization and error accumulation, and transmit their gradient estimates to the PS over a multiple access channel (MAC). We then introduce a novel analog scheme, called A-DSGD, which exploits the additive nature of the wireless MAC for over-the-air gradient computation, and provide convergence analysis for this approach. In A-DSGD, the devices first sparsify their gradient estimates, and then project them to a lower dimensional space imposed by the available channel bandwidth. These projections are sent directly over the MAC without employing any digital code. Numerical results show that A-DSGD converges faster than D-DSGD thanks to its more efficient use of the limited bandwidth and the natural alignment of the gradient estimates over the channel. The improvement is particularly compelling at low power and low bandwidth regimes. We also illustrate for a classification problem that, A-DSGD is more robust to bias in data distribution across devices, while D-DSGD significantly outperforms other digital schemes in the literature. We also observe that both D-DSGD and A-DSGD perform better by increasing the number of devices (while keeping the total dataset size constant), showing their ability in harnessing the computation power of edge devices.

中文翻译：

无线边缘的机器学习：无线分布式随机梯度下降

我们研究了无线边缘的联合机器学习 (ML)，其中具有本地数据集的功率和带宽受限的无线设备在远程参数服务器 (PS) 的帮助下执行分布式随机梯度下降 (DSGD)。标准方法假设单独的计算和通信，其中局部梯度估计被压缩并通过正交链路传输到 PS。遵循这种数字方法，我们引入了 D-DSGD，其中无线设备采用梯度量化和误差累积，并通过多址信道 (MAC) 将其梯度估计传输到 PS。然后，我们介绍了一种称为 A-DSGD 的新型模拟方案，该方案利用无线 MAC 的可加性进行空中梯度计算，并为该方法提供收敛分析。在 A-DSGD 中，设备首先稀疏化它们的梯度估计，然后将它们投影到由可用信道带宽强加的低维空间。这些投影直接通过 MAC 发送，无需使用任何数字代码。数值结果表明，A-DSGD 比 D-DSGD 收敛得更快，因为它更有效地利用了有限的带宽和信道上梯度估计的自然对齐。这种改进在低功率和低带宽情况下尤其引人注目。我们还说明了一个分类问题，A-DSGD 对跨设备数据分布的偏差更稳健，而 D-DSGD 明显优于文献中的其他数字方案。我们还观察到，通过增加设备数量（同时保持总数据集大小不变），D-DSGD 和 A-DSGD 的性能更好，

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>