当前位置: X-MOL 学术IEEE Trans. Circuit Syst. II Express Briefs › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks
IEEE Transactions on Circuits and Systems II: Express Briefs ( IF 4.4 ) Pub Date : 2021-03-26 , DOI: 10.1109/tcsii.2021.3069011
Jian Meng , Li Yang , Xiaochen Peng , Shimeng Yu , Deliang Fan , Jae-Sun Seo

The high computational complexity and a large number of parameters of deep neural networks (DNNs) become the most intensive burden of deep learning hardware design, limiting efficient storage and deployment. With the advantage of high-density storage, non-volatility, and low energy consumption, resistive RAM (RRAM) crossbar based in-memory computing (IMC) has emerged as a promising technique for DNN acceleration. To fully exploit crossbar-based IMC efficiency, a systematic compression design that considers both hardware and algorithm is necessary. In this brief, we present a system-level design considering the low precision weight and activation, structured pruning, and RRAM crossbar mapping. The proposed multi-group Lasso algorithm and hardware implementations have been evaluated on ResNet/VGG models for CIFAR-10/ImageNet datasets. With the fully quantized 4-bit ResNet-18 for CIFAR-10, we achieve up to $65.4\times $ compression compared to full-precision software baseline, and $7\times $ energy reduction compared to the 4-bit unpruned RRAM IMC hardware with 1.1% accuracy loss. For the fully quantized 4-bit ResNet-18 model for ImageNet dataset, we achieve up to $10.9\times $ structured compression with 1.9% accuracy degradation.

中文翻译:

RRAM交叉开关的结构化修剪,可有效地促进深度神经网络的内存中计算

深度神经网络(DNN)的高计算复杂度和大量参数成为深度学习硬件设计的最沉重负担,从而限制了有效的存储和部署。凭借高密度存储,非易失性和低能耗的优势,基于电阻RAM(RRAM)交叉开关的内存中计算(IMC)已成为一种有前途的DNN加速技术。为了充分利用基于交叉开关的IMC效率,必须同时考虑硬件和算法的系统压缩设计。在本简介中,我们提出了一种系统级设计,其中考虑了低精度权重和激活,结构化修剪以及RRAM交叉映射。建议的多组套索算法和硬件实现已在针对CIFAR-10 / ImageNet数据集的ResNet / VGG模型上进行了评估。 $ 65.4 \次$ 压缩与完全精确的软件基准相比,以及 $ 7 \次$ 与4位未修剪的RRAM IMC硬件相比,能耗降低了1.1%。对于ImageNet数据集的完全量化的4位ResNet-18模型,我们实现了 $ 10.9 \次$ 结构化压缩,精度下降1.9%。
更新日期:2021-05-04
down
wechat
bug