Weight Update Skipping: Reducing Training Time for Artificial Neural Networks,IEEE Journal on Emerging and Selected Topics in Circuits and Systems

当前位置： X-MOL 学术 › IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Weight Update Skipping: Reducing Training Time for Artificial Neural Networks
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 2021-11-13 , DOI: 10.1109/jetcas.2021.3127907
Pooneh Safayenikoo , Ismail Akturk

Artificial Neural Networks (ANNs) are known as state-of-the-art techniques in Machine Learning (ML) and have achieved outstanding results in data-intensive applications, such as recognition, classification, and segmentation. These networks mostly use deep layers of convolution and/or fully connected layers with many filters in each layer, demanding a large amount of data and tunable hyperparameters to achieve competitive accuracy. As a result, storage, communication, and computational costs of training (in particular time spent for training) become limiting factors to scale them up. In this paper, we propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations which allow us to skip updating weights when the variation is minuscule. During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting; however, we selectively skip updating weights (and their time-consuming computations). This training approach virtually achieves the same accuracy with considerably less computational cost and reduces the time spent on training. We developed two variations of the proposed training method for selectively updating weights, and call them as i) Weight Update Skipping (WUS), and ii) Weight Update Skipping with Learning Rate Scheduler (WUS+LR). We evaluate these two approaches by analyzing state-of-the-art models, including AlexNet, VGG-11, VGG-16, ResNet-18 on CIFAR datasets. We also use ImageNet dataset for AlexNet, VGG-16, and Resnet-18. On average, WUS and WUS+LR reduced the training time (compared to the baseline) by 54%, and 50% on CPU and 22%, and 21% on GPU, respectively for CIFAR-10; and 43% and 35% on CPU and 22%, and 21% on GPU, respectively for CIFAR-100; and finally 30% and 27% for ImageNet, respectively.

中文翻译：

跳过权重更新：减少人工神经网络的训练时间

人工神经网络 (ANN) 被称为机器学习 (ML) 领域最先进的技术，在识别、分类和分割等数据密集型应用中取得了出色的成果。这些网络大多使用深层卷积层和/或全连接层，每层有许多过滤器，需要大量数据和可调节的超参数来实现有竞争力的精度。因此，训练的存储、通信和计算成本（特别是训练所花费的时间）成为扩大规模的限制因素。在本文中，我们提出了一种新的人工神经网络训练方法，该方法利用对准确性提高的观察显示时间变化，这使得我们能够在变化很小时跳过更新权重。在这样的时间窗口内，我们不断更新偏差，以确保网络仍然可以训练并避免过度拟合；然而，我们有选择地跳过更新权重（及其耗时的计算）。这种训练方法实际上以相当低的计算成本实现了相同的精度，并减少了训练所花费的时间。我们开发了所提出的训练方法的两种变体，用于选择性更新权重，并将它们称为 i）权重更新跳过（WUS）和 ii）使用学习率调度器的权重更新跳过（WUS+LR）。我们通过分析最先进的模型来评估这两种方法，包括 CIFAR 数据集上的 AlexNet、VGG-11、VGG-16、ResNet-18。我们还使用 AlexNet、VGG-16 和 Resnet-18 的 ImageNet 数据集。平均而言，对于 CIFAR-10，WUS 和 WUS+LR 在 CPU 上的训练时间（与基线相比）分别减少了 54% 和 50%，在 GPU 上的训练时间分别减少了 22% 和 21%；对于 CIFAR-100，CPU 上分别为 43% 和 35%，GPU 上分别为 22% 和 21%；最后，ImageNet 分别为 30% 和 27%。

更新日期：2021-11-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11