当前位置: X-MOL 学术ACM Trans. Des. Autom. Electron. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Runtime Reconfigurable Design of Compute-in-Memory–Based Hardware Accelerator for Deep Learning Inference
ACM Transactions on Design Automation of Electronic Systems ( IF 1.4 ) Pub Date : 2021-06-28 , DOI: 10.1145/3460436
Anni Lu 1 , Xiaochen Peng 1 , Yandong Luo 1 , Shanshi Huang 1 , Shimeng Yu 1
Affiliation  

Compute-in-memory (CIM) is an attractive solution to address the “memory wall” challenges for the extensive computation in deep learning hardware accelerators. For custom ASIC design, a specific chip instance is restricted to a specific network during runtime. However, the development cycle of the hardware is normally far behind the emergence of new algorithms. Although some of the reported CIM-based architectures can adapt to different deep neural network (DNN) models, few details about the dataflow or control were disclosed to enable such an assumption. Instruction set architecture (ISA) could support high flexibility, but its complexity would be an obstacle to efficiency. In this article, a runtime reconfigurable design methodology of CIM-based accelerators is proposed to support a class of convolutional neural networks running on one prefabricated chip instance with ASIC-like efficiency. First, several design aspects are investigated: (1) the reconfigurable weight mapping method; (2) the input side of data transmission, mainly about the weight reloading; and (3) the output side of data processing, mainly about the reconfigurable accumulation. Then, a system-level performance benchmark is performed for the inference of different DNN models, such as VGG-8 on a CIFAR-10 dataset and AlexNet GoogLeNet, ResNet-18, and DenseNet-121 on an ImageNet dataset to measure the trade-offs between runtime reconfigurability, chip area, memory utilization, throughput, and energy efficiency.

中文翻译:

用于深度学习推理的基于内存计算的硬件加速器的运行时可重构设计

内存计算 (CIM) 是解决深度学习硬件加速器中广泛计算的“内存墙”挑战的有吸引力的解决方案。对于定制 ASIC 设计,特定的芯片实例在运行时被限制在特定的网络中。然而,硬件的开发周期通常远远落后于新算法的出现。尽管一些报告的基于 CIM 的架构可以适应不同的深度神经网络 (DNN) 模型,但很少披露有关数据流或控制的细节来支持这样的假设。指令集架构 (ISA) 可以支持高灵活性,但其复杂性将成为效率的障碍。在本文中,提出了一种基于 CIM 的加速器的运行时可重构设计方法,以支持在一个预制芯片实例上以类似 ASIC 的效率运行的一类卷积神经网络。首先,研究了几个设计方面:(1)可重构权重映射方法;(2)数据传输的输入端,主要是重量重装;(3)数据处理的输出端,主要是可重构的积累。然后,针对不同 DNN 模型的推理执行系统级性能基准测试,例如 CIFAR-10 数据集上的 VGG-8 和 ImageNet 数据集上的 AlexNet GoogLeNet、ResNet-18 和 DenseNet-121,以衡量贸易-运行时可重新配置性、芯片面积、内存利用率、吞吐量和能源效率之间的偏差。研究了几个设计方面:(1)可重构权重映射方法;(2)数据传输的输入端,主要是重量重装;(3)数据处理的输出端,主要是可重构的积累。然后,针对不同 DNN 模型的推理执行系统级性能基准测试,例如 CIFAR-10 数据集上的 VGG-8 和 ImageNet 数据集上的 AlexNet GoogLeNet、ResNet-18 和 DenseNet-121,以衡量贸易-运行时可重新配置性、芯片面积、内存利用率、吞吐量和能源效率之间的偏差。研究了几个设计方面:(1)可重构权重映射方法;(2)数据传输的输入端,主要是重量重装;(3)数据处理的输出端,主要是可重构的积累。然后,针对不同 DNN 模型的推理执行系统级性能基准测试,例如 CIFAR-10 数据集上的 VGG-8 和 ImageNet 数据集上的 AlexNet GoogLeNet、ResNet-18 和 DenseNet-121,以衡量贸易-运行时可重新配置性、芯片面积、内存利用率、吞吐量和能源效率之间的偏差。
更新日期:2021-06-28
down
wechat
bug