A Continuous-Time Analysis of Distributed Stochastic Gradient,Neural Computation

当前位置： X-MOL 学术 › Neural Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Continuous-Time Analysis of Distributed Stochastic Gradient
Neural Computation ( IF 2.7 ) Pub Date : 2020-01-01 , DOI: 10.1162/neco_a_01248
Nicholas M Boffi ₁ , Jean-Jacques E Slotine ₂

Affiliation

We analyze the effect of synchronization on distributed stochastic gradient algorithms. By exploiting an analogy with dynamical models of biological quorum sensing, where synchronization between agents is induced through communication with a common signal, we quantify how synchronization can significantly reduce the magnitude of the noise felt by the individual distributed agents and their spatial mean. This noise reduction is in turn associated with a reduction in the smoothing of the loss function imposed by the stochastic gradient approximation. Through simulations on model nonconvex objectives, we demonstrate that coupling can stabilize higher noise levels and improve convergence. We provide a convergence analysis for strongly convex functions by deriving a bound on the expected deviation of the spatial mean of the agents from the global minimizer for an algorithm based on quorum sensing, the same algorithm with momentum, and the elastic averaging SGD (EASGD) algorithm. We discuss extensions to new algorithms that allow each agent to broadcast its current measure of success and shape the collective computation accordingly. We supplement our theoretical analysis with numerical experiments on convolutional neural networks trained on the CIFAR-10 data set, where we note a surprising regularizing property of EASGD even when applied to the non-distributed case. This observation suggests alternative second-order in time algorithms for nondistributed optimization that are competitive with momentum methods.

中文翻译：

分布式随机梯度的连续时间分析

我们分析了同步对分布式随机梯度算法的影响。通过利用生物群体感应的动力学模型进行类比，其中代理之间的同步是通过与公共信号的通信引起的，我们量化了同步如何显着降低单个分布式代理感受到的噪声大小及其空间平均值。这种降噪又与随机梯度近似所施加的损失函数平滑度的降低有关。通过对模型非凸目标的模拟，我们证明耦合可以稳定更高的噪声水平并提高收敛性。我们通过为基于群体感应的算法、具有动量的相同算法和弹性平均 SGD (EASGD) 的算法的全局最小化器导出代理空间平均值的预期偏差的界限，为强凸函数提供收敛分析算法。我们讨论了新算法的扩展，允许每个代理广播其当前的成功度量并相应地塑造集体计算。我们通过在 CIFAR-10 数据集上训练的卷积神经网络的数值实验来补充我们的理论分析，我们注意到即使应用于非分布式情况时 EASGD 的令人惊讶的正则化特性。这一观察结果表明，与动量方法竞争的非分布式优化的替代二阶时间算法。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11