当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks
arXiv - CS - Hardware Architecture Pub Date : 2021-02-10 , DOI: arxiv-2102.05271
Vinay Joshi, Wangxin He, Jae-sun Seo, Bipin Rajendran

The cost involved in training deep neural networks (DNNs) on von-Neumann architectures has motivated the development of novel solutions for efficient DNN training accelerators. We propose a hybrid in-memory computing (HIC) architecture for the training of DNNs on hardware accelerators that results in memory-efficient inference and outperforms baseline software accuracy in benchmark tasks. We introduce a weight representation technique that exploits both binary and multi-level phase-change memory (PCM) devices, and this leads to a memory-efficient inference accelerator. Unlike previous in-memory computing-based implementations, we use a low precision weight update accumulator that results in more memory savings. We trained the ResNet-32 network to classify CIFAR-10 images using HIC. For a comparable model size, HIC-based training outperforms baseline network, trained in floating-point 32-bit (FP32) precision, by leveraging appropriate network width multiplier. Furthermore, we observe that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy. We also show that the temporal drift in PCM devices has a negligible effect on post-training inference accuracy for extended periods (year). Finally, our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM, demonstrating the feasibility of this architecture for achieving hardware platforms that can learn in the field.

中文翻译:

用于深度神经网络训练的混合内存计算架构

在von-Neumann架构上训练深度神经网络(DNN)涉及的成本促使开发出有效的DNN训练加速器的新颖解决方案。我们提出了一种用于在硬件加速器上训练DNN的混合内存计算(HIC)架构,该架构可实现内存高效的推理,并在基准测试任务中优于基线软件的准确性。我们引入了一种权重表示技术,该技术可以利用二进制和多级相变存储(PCM)设备,从而产生了一种内存有效的推理加速器。与以前的基于内存计算的实现不同,我们使用了一种低精度的权重更新累加器,可以节省更多的内存。我们训练了ResNet-32网络,以使用HIC对CIFAR-10图像进行分类。对于可比较的模型尺寸,通过使用适当的网络宽度乘数,基于HIC的训练优于以浮点32位(FP32)精度训练的基线网络。此外,我们观察到,基于HIC的训练可使推理模型的大小减少约50%,以实现基线可比的准确性。我们还表明,PCM设备中的时间漂移​​对训练后推断精度的影响可忽略不计(延长的时间)(年)。最后,我们的仿真表明,基于HIC的训练自然可以确保设备看到的写-擦除周期数仅是PCM耐久性极限的一小部分,证明了该架构在实现可以在现场学习的硬件平台的可行性。我们观察到,基于HIC的训练可使推理模型的大小减少约50%,以实现基线可比的准确性。我们还表明,PCM设备中的时间漂移​​对训练后推断精度的影响可忽略不计(延长的时间)(年)。最后,我们的仿真表明,基于HIC的训练自然可以确保设备看到的写-擦除周期数仅是PCM耐久性极限的一小部分,证明了该架构在实现可以在现场学习的硬件平台的可行性。我们观察到,基于HIC的训练可使推理模型的大小减少约50%,以实现基线可比的准确性。我们还表明,PCM设备中的时间漂移​​对训练后推断精度的影响可忽略不计(延长的时间)(年)。最后,我们的仿真表明,基于HIC的训练自然可以确保设备看到的写-擦除周期数仅是PCM耐久性极限的一小部分,证明了该架构在实现可以在现场学习的硬件平台的可行性。
更新日期:2021-02-11
down
wechat
bug