当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-06-11 , DOI: arxiv-2106.06401
Eugene BelilovskyMILA, Louis LeconteMLIA, CMAP, Lucas CacciaMILA, Michael EickenbergMLIA, Edouard OyallonMLIA

A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It is based on a classic greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization. With the use of a replay buffer we show that this approach can be extended to asynchronous settings, where modules can operate and continue to update with possibly large communication delays. To address bandwidth and memory issues we propose an approach based on online vector quantization. This allows to drastically reduce the communication bandwidth between modules and required memory for replay buffers. We show theoretically and empirically that this approach converges and compare it to the sequential solvers. We demonstrate the effectiveness of DGL against alternative approaches on the CIFAR-10 dataset and on the large-scale ImageNet dataset.

中文翻译:

用于同步和异步分布式学习的 CNN 的解耦贪婪学习

使用反向传播的神经网络训练的一个普遍提到的低效率是更新锁定问题:每一层在更新之前必须等待信号通过整个网络传播。已经提出了几种可以缓解这个问题的替代方案。在这种情况下,我们考虑基于最小反馈的简单替代方案,我们称之为解耦贪婪学习 (DGL)。它基于联合训练目标的经典贪婪松弛,最近证明在卷积神经网络 (CNN) 的大规模图像分类环境中是有效的。我们考虑优化这个目标,使我们能够解耦层训练,允许网络中的层或模块通过潜在的线性并行化进行训练。通过使用重放缓冲区,我们表明这种方法可以扩展到异步设置,其中模块可以运行并继续更新,但可能存在较大的通信延迟。为了解决带宽和内存问题,我们提出了一种基于在线矢量量化的方法。这允许大大减少模块之间的通信带宽和重放缓冲区所需的内存。我们从理论上和经验上表明这种方法收敛并将其与顺序求解器进行比较。我们在 CIFAR-10 数据集和大规模 ImageNet 数据集上证明了 DGL 对替代方法的有效性。为了解决带宽和内存问题,我们提出了一种基于在线矢量量化的方法。这允许大大减少模块之间的通信带宽和重放缓冲区所需的内存。我们从理论上和经验上表明这种方法收敛并将其与顺序求解器进行比较。我们在 CIFAR-10 数据集和大规模 ImageNet 数据集上证明了 DGL 对替代方法的有效性。为了解决带宽和内存问题,我们提出了一种基于在线矢量量化的方法。这允许大大减少模块之间的通信带宽和重放缓冲区所需的内存。我们从理论上和经验上表明这种方法收敛并将其与顺序求解器进行比较。我们在 CIFAR-10 数据集和大规模 ImageNet 数据集上证明了 DGL 对替代方法的有效性。
更新日期:2021-06-14
down
wechat
bug