SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
arXiv - CS - Hardware Architecture Pub Date : 2021-03-02 , DOI: arxiv-2103.01705
Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in the DNN. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure. As the countermeasure, we developed a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weigh mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7x and 2.1x using Resent-50 and MobileNet-v2, respectively, with less than 0.3% accuracy drop on ImageNet.

中文翻译：

SME：基于ReRAM的稀疏乘法引擎可压缩神经网络的位稀疏性

电阻式随机存取存储器（ReRAM）交叉开关是深度神经网络（DNN）加速器的一项有前途的技术，这归功于其在矢量矩阵乘法和累加（VMM）中的内存和原位模拟计算能力。但是，纵横制架构要利用DNN中的稀疏性是一项挑战。由于紧密耦合的交叉开关结构的局限性，不可避免地导致复杂且成本高昂的控制利用细粒度的稀疏性。作为对策，我们在硬件和软件协同设计框架的基础上，开发了一种新颖的基于ReRAM的DNN加速器，称为稀疏乘法引擎（SME）。首先，基于现有的量化方法，我们对比特稀疏模式进行编排以增加比特稀疏性的密度。第二，我们提出了一种新颖的权重映射机制，以将权重的各个比特位切入交叉开关，并将激活结果拼接到外围电路中。这种机制可以使紧密耦合的纵横制结构解耦，并累积纵横制中的稀疏性。最后，一种高级的挤出方案清空了前两步中映射有稀疏非零值的交叉开关。我们设计了SME架构，并讨论了其在其他量化方法和不同ReRAM单元技术中的用途。与现有的最新技术相比，SME使用Resent-50和MobileNet-v2将交叉开关的使用范围分别缩小到8.7倍和2.1倍，而ImageNet的准确度下降不到0.3％。这种机制可以使紧密耦合的纵横制结构解耦，并累积纵横制中的稀疏性。最后，一种高级的挤出方案清空了前两步中映射有稀疏非零值的交叉开关。我们设计了SME架构，并讨论了其在其他量化方法和不同ReRAM单元技术中的用途。与现有的最新技术相比，SME使用Resent-50和MobileNet-v2将交叉开关的使用范围分别缩小到8.7倍和2.1倍，而ImageNet的准确度下降不到0.3％。这种机制可以使紧密耦合的纵横制结构解耦，并累积纵横制中的稀疏性。最后，一种高级的挤出方案清空了前两步中映射有稀疏非零值的交叉开关。我们设计了SME架构，并讨论了其在其他量化方法和不同ReRAM单元技术中的用途。与现有的最新技术相比，SME使用Resent-50和MobileNet-v2将交叉开关的使用范围分别缩小到8.7倍和2.1倍，而ImageNet的准确度下降不到0.3％。

更新日期：2021-03-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>