Communications in Mathematical Sciences

Volume 19 (2021)

Number 3

A sharp convergence rate for a model equation of the asynchronous stochastic gradient descent

Pages: 851 – 863

(Fast Communication)

DOI: https://dx.doi.org/10.4310/CMS.2021.v19.n3.a13

Authors

Yuhua Zhu (Department of Mathematics, Stanford University, Stanford, California, U.S.A.)

Lexing Ying (Department of Mathematics, Stanford University, Stanford, California, U.S.A.)

Abstract

We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. “Stochastic modified equations for the asynchronous stochastic gradient descent”, arXiv:1805.08244]. We prove that when the number of local workers is larger than the expected staleness, then ASGD is more efficient than stochastic gradient descent. Our theoretical result also suggests that longer delays result in slower convergence rate. Besides, the learning rate cannot be smaller than a threshold inversely proportional to the expected staleness.

Keywords

asynchronous stochastic gradient descent, stochastic modified equations, distributed learning

2010 Mathematics Subject Classification

65K05, 68W15, 68W20, 90C15

Received 26 January 2020

Accepted 28 November 2020

Published 5 May 2021