A sharp convergence rate for a model equation of the asynchronous stochastic gradient descent,Communications in Mathematical Sciences

当前位置： X-MOL 学术 › Commun. Math. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A sharp convergence rate for a model equation of the asynchronous stochastic gradient descent
Communications in Mathematical Sciences ( IF 1.2 ) Pub Date : 2021-01-01 , DOI: 10.4310/cms.2021.v19.n3.a13
Yuhua Zhu ₁ , Lexing Ying ₁

Affiliation

We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. “Stochastic modified equations for the asynchronous stochastic gradient descent”, arXiv:1805.08244]. We prove that when the number of local workers is larger than the expected staleness, then ASGD is more efficient than stochastic gradient descent. Our theoretical result also suggests that longer delays result in slower convergence rate. Besides, the learning rate cannot be smaller than a threshold inversely proportional to the expected staleness.

中文翻译：

异步随机梯度下降模型方程的收敛速度

当损失函数是基于[An et al。“异步随机梯度下降的随机修正方程”，arXiv：1805.08244]。我们证明，当本地工人的数量大于预期的陈旧性时，ASGD比随机梯度下降的效率更高。我们的理论结果还表明，较长的延迟会导致较慢的收敛速度。此外，学习率不能小于与预期陈旧度成反比的阈值。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11