当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Guided parallelized stochastic gradient descent for delay compensation
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-17 , DOI: arxiv-2101.07259
Anuraganand Sharma

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate for the delay. The experimental results demonstrate that our proposed approach has been able to mitigate the impact of delay for the quality of classification accuracy. The guided approach with SSGD clearly outperforms sequential SGD and even achieves the accuracy close to sequential SGD for some benchmark datasets.

中文翻译:

引导并行随机梯度下降用于延迟补偿

随机梯度下降(SGD)算法及其变体已被有效地用于优化神经网络模型。但是,随着大数据的快速增长和深度学习,SGD不再是最合适的选择,因为它具有对误差函数进行顺序优化的自然行为。这导致了并行SGD算法的开发,例如异步SGD(ASGD)和同步SGD(SSGD)以训练深度神经网络。但是,由于参数(权重)更新的延迟,因此引入了很大的差异。我们在提出的算法中解决了这种延迟,并试图将其影响降至最低。我们采用了指导性的SGD(gSGD),该方法鼓励一致的示例通过补偿由延迟引起的不可预测的偏差来引导收敛。其收敛速度也类似于A / SSGD,但是,需要一些额外的(并行)处理来补偿延迟。实验结果表明,我们提出的方法已经能够减轻延迟对分类准确性质量的影响。SSGD的指导方法显然优于顺序SGD,甚至对于某些基准数据集,其精度甚至接近顺序SGD。
更新日期:2021-01-20
down
wechat
bug