当前位置:
X-MOL 学术
›
arXiv.cs.IT
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
arXiv - CS - Information Theory Pub Date : 2019-01-03 , DOI: arxiv-1901.00844 Mohammad Mohammadi Amiri and Deniz Gunduz
arXiv - CS - Information Theory Pub Date : 2019-01-03 , DOI: arxiv-1901.00844 Mohammad Mohammadi Amiri and Deniz Gunduz
We study federated machine learning (ML) at the wireless edge, where power-
and bandwidth-limited wireless devices with local datasets carry out
distributed stochastic gradient descent (DSGD) with the help of a remote
parameter server (PS). Standard approaches assume separate computation and
communication, where local gradient estimates are compressed and transmitted to
the PS over orthogonal links. Following this digital approach, we introduce
D-DSGD, in which the wireless devices employ gradient quantization and error
accumulation, and transmit their gradient estimates to the PS over a multiple
access channel (MAC). We then introduce a novel analog scheme, called A-DSGD,
which exploits the additive nature of the wireless MAC for over-the-air
gradient computation, and provide convergence analysis for this approach. In
A-DSGD, the devices first sparsify their gradient estimates, and then project
them to a lower dimensional space imposed by the available channel bandwidth.
These projections are sent directly over the MAC without employing any digital
code. Numerical results show that A-DSGD converges faster than D-DSGD thanks to
its more efficient use of the limited bandwidth and the natural alignment of
the gradient estimates over the channel. The improvement is particularly
compelling at low power and low bandwidth regimes. We also illustrate for a
classification problem that, A-DSGD is more robust to bias in data distribution
across devices, while D-DSGD significantly outperforms other digital schemes in
the literature. We also observe that both D-DSGD and A-DSGD perform better by
increasing the number of devices (while keeping the total dataset size
constant), showing their ability in harnessing the computation power of edge
devices.
中文翻译:
无线边缘的机器学习:无线分布式随机梯度下降
我们研究了无线边缘的联合机器学习 (ML),其中具有本地数据集的功率和带宽受限的无线设备在远程参数服务器 (PS) 的帮助下执行分布式随机梯度下降 (DSGD)。标准方法假设单独的计算和通信,其中局部梯度估计被压缩并通过正交链路传输到 PS。遵循这种数字方法,我们引入了 D-DSGD,其中无线设备采用梯度量化和误差累积,并通过多址信道 (MAC) 将其梯度估计传输到 PS。然后,我们介绍了一种称为 A-DSGD 的新型模拟方案,该方案利用无线 MAC 的可加性进行空中梯度计算,并为该方法提供收敛分析。在 A-DSGD 中,设备首先稀疏化它们的梯度估计,然后将它们投影到由可用信道带宽强加的低维空间。这些投影直接通过 MAC 发送,无需使用任何数字代码。数值结果表明,A-DSGD 比 D-DSGD 收敛得更快,因为它更有效地利用了有限的带宽和信道上梯度估计的自然对齐。这种改进在低功率和低带宽情况下尤其引人注目。我们还说明了一个分类问题,A-DSGD 对跨设备数据分布的偏差更稳健,而 D-DSGD 明显优于文献中的其他数字方案。我们还观察到,通过增加设备数量(同时保持总数据集大小不变),D-DSGD 和 A-DSGD 的性能更好,
更新日期:2020-04-08
中文翻译:
无线边缘的机器学习:无线分布式随机梯度下降
我们研究了无线边缘的联合机器学习 (ML),其中具有本地数据集的功率和带宽受限的无线设备在远程参数服务器 (PS) 的帮助下执行分布式随机梯度下降 (DSGD)。标准方法假设单独的计算和通信,其中局部梯度估计被压缩并通过正交链路传输到 PS。遵循这种数字方法,我们引入了 D-DSGD,其中无线设备采用梯度量化和误差累积,并通过多址信道 (MAC) 将其梯度估计传输到 PS。然后,我们介绍了一种称为 A-DSGD 的新型模拟方案,该方案利用无线 MAC 的可加性进行空中梯度计算,并为该方法提供收敛分析。在 A-DSGD 中,设备首先稀疏化它们的梯度估计,然后将它们投影到由可用信道带宽强加的低维空间。这些投影直接通过 MAC 发送,无需使用任何数字代码。数值结果表明,A-DSGD 比 D-DSGD 收敛得更快,因为它更有效地利用了有限的带宽和信道上梯度估计的自然对齐。这种改进在低功率和低带宽情况下尤其引人注目。我们还说明了一个分类问题,A-DSGD 对跨设备数据分布的偏差更稳健,而 D-DSGD 明显优于文献中的其他数字方案。我们还观察到,通过增加设备数量(同时保持总数据集大小不变),D-DSGD 和 A-DSGD 的性能更好,