Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks
arXiv - CS - Hardware Architecture Pub Date : 2021-02-10 , DOI: arxiv-2102.05271
Vinay Joshi, Wangxin He, Jae-sun Seo, Bipin Rajendran

The cost involved in training deep neural networks (DNNs) on von-Neumann architectures has motivated the development of novel solutions for efficient DNN training accelerators. We propose a hybrid in-memory computing (HIC) architecture for the training of DNNs on hardware accelerators that results in memory-efficient inference and outperforms baseline software accuracy in benchmark tasks. We introduce a weight representation technique that exploits both binary and multi-level phase-change memory (PCM) devices, and this leads to a memory-efficient inference accelerator. Unlike previous in-memory computing-based implementations, we use a low precision weight update accumulator that results in more memory savings. We trained the ResNet-32 network to classify CIFAR-10 images using HIC. For a comparable model size, HIC-based training outperforms baseline network, trained in floating-point 32-bit (FP32) precision, by leveraging appropriate network width multiplier. Furthermore, we observe that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy. We also show that the temporal drift in PCM devices has a negligible effect on post-training inference accuracy for extended periods (year). Finally, our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM, demonstrating the feasibility of this architecture for achieving hardware platforms that can learn in the field.

中文翻译：

用于深度神经网络训练的混合内存计算架构

在von-Neumann架构上训练深度神经网络（DNN）涉及的成本促使开发出有效的DNN训练加速器的新颖解决方案。我们提出了一种用于在硬件加速器上训练DNN的混合内存计算（HIC）架构，该架构可实现内存高效的推理，并在基准测试任务中优于基线软件的准确性。我们引入了一种权重表示技术，该技术可以利用二进制和多级相变存储（PCM）设备，从而产生了一种内存有效的推理加速器。与以前的基于内存计算的实现不同，我们使用了一种低精度的权重更新累加器，可以节省更多的内存。我们训练了ResNet-32网络，以使用HIC对CIFAR-10图像进行分类。对于可比较的模型尺寸，通过使用适当的网络宽度乘数，基于HIC的训练优于以浮点32位（FP32）精度训练的基线网络。此外，我们观察到，基于HIC的训练可使推理模型的大小减少约50％，以实现基线可比的准确性。我们还表明，PCM设备中的时间漂移对训练后推断精度的影响可忽略不计（延长的时间）（年）。最后，我们的仿真表明，基于HIC的训练自然可以确保设备看到的写-擦除周期数仅是PCM耐久性极限的一小部分，证明了该架构在实现可以在现场学习的硬件平台的可行性。我们观察到，基于HIC的训练可使推理模型的大小减少约50％，以实现基线可比的准确性。我们还表明，PCM设备中的时间漂移对训练后推断精度的影响可忽略不计（延长的时间）（年）。最后，我们的仿真表明，基于HIC的训练自然可以确保设备看到的写-擦除周期数仅是PCM耐久性极限的一小部分，证明了该架构在实现可以在现场学习的硬件平台的可行性。我们观察到，基于HIC的训练可使推理模型的大小减少约50％，以实现基线可比的准确性。我们还表明，PCM设备中的时间漂移对训练后推断精度的影响可忽略不计（延长的时间）（年）。最后，我们的仿真表明，基于HIC的训练自然可以确保设备看到的写-擦除周期数仅是PCM耐久性极限的一小部分，证明了该架构在实现可以在现场学习的硬件平台的可行性。

更新日期：2021-02-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>