MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks,arXiv - CS - Emerging Technologies

当前位置： X-MOL 学术 › arXiv.cs.ET › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks
arXiv - CS - Emerging Technologies Pub Date : 2020-10-24 , DOI: arxiv-2010.12861
Syuan-Hao Sie, Jye-Luen Lee, Yi-Ren Chen, Chih-Cheng Lu, Chih-Cheng Hsieh, Meng-Fan Chang, Kea-Tiong Tang

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros, such as the number of word lines and bit lines that can be turned on at the same time, as well as how to map the weight to the SRAM CIM macro. In this study, a software and hardware co-design approach is proposed to design an SRAM CIM-based CNN accelerator and an SRAM CIM-aware model compression algorithm. To lessen the high-precision MAC required by batch normalization (BN), a quantization algorithm that can fuse BN into the weights is proposed. Furthermore, to reduce the number of network parameters, a sparsity algorithm that considers a CIM architecture is proposed. Last, MARS, a CIM-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparsity neural network, is proposed.

中文翻译：

MARS：具有联合设计的压缩神经网络的多宏架构 SRAM 基于 CIM 的加速器

卷积神经网络 (CNN) 在深度学习应用中发挥着关键作用。然而，CNN 的大量存储开销和大量计算成本在硬件加速器中存在问题。内存计算 (CIM) 架构已显示出有效计算大规模矩阵向量乘法的巨大潜力。然而，在交叉开关阵列上执行的密集乘法和累加 (MAC) 操作以及 CIM 宏的有限容量仍然是进一步提高能效和吞吐量的瓶颈。为了降低计算成本，网络修剪和量化是两种广泛研究的压缩方法，以缩小模型大小。然而，大多数模型压缩算法只能在基于数字的 CNN 加速器中实现。为了在静态随机存取存储器 (SRAM) 基于 CIM 的加速器中实现，模型压缩算法必须考虑 CIM 宏的硬件限制，例如可以同时打开的字线和位线的数量，如以及如何将权重映射到 SRAM CIM 宏。在本研究中，提出了一种软硬件协同设计方法来设计基于 SRAM CIM 的 CNN 加速器和 SRAM CIM 感知模型压缩算法。为了减少批量归一化（BN）所需的高精度MAC，提出了一种可以将BN融合到权重中的量化算法。此外，为了减少网络参数的数量，提出了一种考虑 CIM 架构的稀疏算法。最后，火星，

更新日期：2020-10-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文