Pipelined Backpropagation at Scale: Training Large Models without Batches,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pipelined Backpropagation at Scale: Training Large Models without Batches
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-03-25 , DOI: arxiv-2003.11666
Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs K\"oster

New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms. In this work we evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages. We introduce two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipelined Backpropagation and outperform existing techniques in our setting. We show that appropriate normalization and small batch sizes can also aid training. With our methods, fine-grained Pipelined Backpropagation using a batch size of one can match the accuracy of SGD for multiple networks trained on CIFAR-10 and ImageNet. Simple scaling rules allow the use of existing hyperparamaters for traditional training without additional tuning.

中文翻译：

大规模流水线反向传播：无需批量训练大型模型

新硬件可以大幅提高深度神经网络训练的速度和效率。为了指导未来硬件架构的发展，探索替代训练算法的硬件和机器学习特性是相关的。在这项工作中，我们评估了小批量、细粒度流水线反向传播的使用，这是一种具有显着硬件优势的异步流水线并行训练算法。我们介绍了两种方法，尖峰补偿和线性权重预测，它们有效地减轻了由流水线反向传播的异步性引起的缺点，并在我们的设置中优于现有技术。我们表明适当的标准化和小批量也可以帮助训练。用我们的方法，对于在 CIFAR-10 和 ImageNet 上训练的多个网络，使用批量大小为 1 的细粒度流水线反向传播可以匹配 SGD 的准确性。简单的缩放规则允许使用现有的超参数进行传统训练，而无需额外调整。

更新日期：2020-10-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>