当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
arXiv - CS - Hardware Architecture Pub Date : 2021-03-02 , DOI: arxiv-2103.01705
Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in the DNN. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure. As the countermeasure, we developed a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weigh mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7x and 2.1x using Resent-50 and MobileNet-v2, respectively, with less than 0.3% accuracy drop on ImageNet.

中文翻译:

SME:基于ReRAM的稀疏乘法引擎可压缩神经网络的位稀疏性

电阻式随机存取存储器(ReRAM)交叉开关是深度神经网络(DNN)加速器的一项有前途的技术,这归功于其在矢量矩阵乘法和累加(VMM)中的内存和原位模拟计算能力。但是,纵横制架构要利用DNN中的稀疏性是一项挑战。由于紧密耦合的交叉开关结构的局限性,不可避免地导致复杂且成本高昂的控制利用细粒度的稀疏性。作为对策,我们在硬件和软件协同设计框架的基础上,开发了一种新颖的基于ReRAM的DNN加速器,称为稀疏乘法引擎(SME)。首先,基于现有的量化方法,我们对比特稀疏模式进行编排以增加比特稀疏性的密度。第二,我们提出了一种新颖的权重映射机制,以将权重的各个比特位切入交叉开关,并将激活结果拼接到外围电路中。这种机制可以使紧密耦合的纵横制结构解耦,并累积纵横制中的稀疏性。最后,一种高级的挤出方案清空了前两步中映射有稀疏非零值的交叉开关。我们设计了SME架构,并讨论了其在其他量化方法和不同ReRAM单元技术中的用途。与现有的最新技术相比,SME使用Resent-50和MobileNet-v2将交叉开关的使用范围分别缩小到8.7倍和2.1倍,而ImageNet的准确度下降不到0.3%。这种机制可以使紧密耦合的纵横制结构解耦,并累积纵横制中的稀疏性。最后,一种高级的挤出方案清空了前两步中映射有稀疏非零值的交叉开关。我们设计了SME架构,并讨论了其在其他量化方法和不同ReRAM单元技术中的用途。与现有的最新技术相比,SME使用Resent-50和MobileNet-v2将交叉开关的使用范围分别缩小到8.7倍和2.1倍,而ImageNet的准确度下降不到0.3%。这种机制可以使紧密耦合的纵横制结构解耦,并累积纵横制中的稀疏性。最后,一种高级的挤出方案清空了前两步中映射有稀疏非零值的交叉开关。我们设计了SME架构,并讨论了其在其他量化方法和不同ReRAM单元技术中的用途。与现有的最新技术相比,SME使用Resent-50和MobileNet-v2将交叉开关的使用范围分别缩小到8.7倍和2.1倍,而ImageNet的准确度下降不到0.3%。
更新日期:2021-03-03
down
wechat
bug