当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic Block-Wise Local Learning Algorithm for Efficient Neural Network Training
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-07-26 , DOI: 10.1109/tvlsi.2021.3097341
Gwangho Lee , Sunwoo Lee , Dongsuk Jeon

In the backpropagation algorithm, the error calculated from the output of the neural network should backpropagate the layers to update the weights of each layer, making it difficult to parallelize the training process and requiring frequent off-chip memory access. Local learning algorithms locally generate error signals which are used for weight updates, removing the need for backpropagation of error signals. However, prior works rely on large, complex auxiliary networks for reliable training, which results in large computational overhead and undermines the advantages of local learning. In this work, we propose a local learning algorithm that significantly reduces computational complexity as well as improves training performance. Our algorithm combines multiple consecutive layers into a block and performs local learning on a block-by-block basis, while dynamically changing block boundaries during training. In experiments, our approach achieves 95.68% and 79.42% test accuracy on the CIFAR-10 and CIFAR-100 datasets, respectively, using a small fully connected layer as auxiliary networks, closely matching the performance of the backpropagation algorithm. Multiply-accumulate (MAC) operations and off-chip memory access also reduce by up to 15% and 81% compared to backpropagation.

中文翻译:

用于高效神经网络训练的动态分块局部学习算法

在反向传播算法中,根据神经网络的输出计算的误差应该反向传播各层以更新每一层的权重,这使得训练过程难以并行化并且需要频繁的片外存储器访问。本地学习算法在本地生成用于权重更新的误差信号,无需反向传播误差信号。然而,先前的工作依赖于大型、复杂的辅助网络进行可靠的训练,这导致了大量的计算开销并削弱了本地学习的优势。在这项工作中,我们提出了一种局部学习算法,可显着降低计算复杂度并提高训练性能。我们的算法将多个连续的层组合成一个块,并在逐块的基础上进行局部学习,同时在训练期间动态改变块边界。在实验中,我们的方法在 CIFAR-10 和 CIFAR-100 数据集上分别实现了 95.68% 和 79.42% 的测试准确率,使用一个小的全连接层作为辅助网络,与反向传播算法的性能密切匹配。与反向传播相比,乘法累加 (MAC) 操作和片外存储器访问也减少了 15% 和 81%。
更新日期:2021-08-31
down
wechat
bug