Dynamic Block-Wise Local Learning Algorithm for Efficient Neural Network Training,IEEE Transactions on Very Large Scale Integration (VLSI) Systems

当前位置： X-MOL 学术 › IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamic Block-Wise Local Learning Algorithm for Efficient Neural Network Training
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-07-26 , DOI: 10.1109/tvlsi.2021.3097341
Gwangho Lee , Sunwoo Lee , Dongsuk Jeon

In the backpropagation algorithm, the error calculated from the output of the neural network should backpropagate the layers to update the weights of each layer, making it difficult to parallelize the training process and requiring frequent off-chip memory access. Local learning algorithms locally generate error signals which are used for weight updates, removing the need for backpropagation of error signals. However, prior works rely on large, complex auxiliary networks for reliable training, which results in large computational overhead and undermines the advantages of local learning. In this work, we propose a local learning algorithm that significantly reduces computational complexity as well as improves training performance. Our algorithm combines multiple consecutive layers into a block and performs local learning on a block-by-block basis, while dynamically changing block boundaries during training. In experiments, our approach achieves 95.68% and 79.42% test accuracy on the CIFAR-10 and CIFAR-100 datasets, respectively, using a small fully connected layer as auxiliary networks, closely matching the performance of the backpropagation algorithm. Multiply-accumulate (MAC) operations and off-chip memory access also reduce by up to 15% and 81% compared to backpropagation.

中文翻译：

用于高效神经网络训练的动态分块局部学习算法

在反向传播算法中，根据神经网络的输出计算出的误差应该反向传播到各层以更新每层的权重，这使得训练过程难以并行化并且需要频繁的片外存储器访问。局部学习算法在本地生成用于权重更新的误差信号，从而消除了误差信号反向传播的需要。然而，先前的工作依赖于大型、复杂的辅助网络来进行可靠的训练，这导致了巨大的计算开销并削弱了本地学习的优势。在这项工作中，我们提出了一种本地学习算法，可以显着降低计算复杂性并提高训练性能。我们的算法将多个连续层组合成一个块，并逐块执行局部学习，同时在训练期间动态改变块边界。在实验中，我们的方法使用小型全连接层作为辅助网络，在 CIFAR-10 和 CIFAR-100 数据集上分别实现了 95.68% 和 79.42% 的测试精度，与反向传播算法的性能非常匹配。与反向传播相比，乘法累加 (MAC) 操作和片外存储器访问也减少了高达 15% 和 81%。

更新日期：2021-07-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11