Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( IF 2.9 ) Pub Date : 2022-05-04 , DOI: 10.1109/tcad.2022.3172600
Fengbin Tu 1 , Yiqi Wang 2 , Ling Liang 1 , Yufei Ding 3 , Leibo Liu 2 , Shaojun Wei 2 , Shouyi Yin 2 , Yuan Xie 1
Affiliation  

Processing-in-memory (PIM) is a promising architecture for neural network (NN) acceleration. Most previous PIMs are based on analog computing, so their accuracy and memory cell array utilization are limited by analog deviation and ADC overhead. Digital PIM is an emerging type of PIM architecture that integrates digital logic in memory cells, which can make full utilization of the cell array without accuracy loss. However, digital PIM’s rigid crossbar architecture and full array activation raise new challenges in sparse NN acceleration. Conventional unstructured or structured sparsity cannot perform well on both the weight and input side of digital PIM. We take the opportunities from digital PIM’s bit-serial processing and in-memory customization, to tackle the above challenges by the co-designing sparse algorithm, multiplication dataflow, and PIM architecture. At the algorithm level, we propose double-broadcast hybrid-grained pruning to exploit weight sparsity with better accuracy and efficiency balance. At the dataflow level, we propose a bit-serial Booth in-SRAM multiplication dataflow for stable acceleration from the input side. At the architecture level, we design a sparse digital PIM (SDP) accelerator with customized SRAM-PIM macros to support the proposed techniques. SDP achieves $3.59\times $ , $8.15\times $ , $3.11\times $ area efficiency, and $6.95\times $ , $29.44\times $ , $39.40\times $ energy savings, over state-of-the-art sparse NN architectures SIGMA, SRE, and Bit Prudent.

中文翻译:

SDP:用于 SRAM 内稀疏神经网络加速的协同设计算法、数据流和架构

内存处理 (PIM) 是一种很有前途的神经网络 (NN) 加速架构。大多数以前的 PIM 基于模拟计算,因此它们的精度和存储单元阵列利用率受到模拟偏差和 ADC 开销的限制。数字 PIM 是一种新兴的 PIM 架构,将数字逻辑集成在存储单元中,可以充分利用单元阵列而不会损失精度。然而,数字 PIM 的刚性交叉结构和全阵列激活对稀疏 NN 加速提出了新的挑战。传统的非结构化或结构化稀疏性在数字 PIM 的权重和输入端都表现不佳。我们利用数字 PIM 的位串行处理和内存定制的机会,通过共同设计稀疏算法、乘法数据流来应对上述挑战,和 PIM 架构。在算法层面,我们提出了双广播混合粒度剪枝,以利用权重稀疏性实现更好的准确性和效率平衡。在数据流级别,我们提出了一种位串行 Booth in-SRAM 乘法数据流,用于从输入端稳定加速。在架构层面,我们设计了一个稀疏数字 PIM (SDP) 加速器,带有定制的 SRAM-PIM 宏以支持所提出的技术。SDP实现 我们设计了一个带有定制 SRAM-PIM 宏的稀疏数字 PIM (SDP) 加速器来支持所提出的技术。SDP实现 我们设计了一个带有定制 SRAM-PIM 宏的稀疏数字 PIM (SDP) 加速器来支持所提出的技术。SDP实现 3.59 美元\次 $ , 8.15 美元\次 $ , $3.11\次 $面积效率,和 6.95 美元\次 $ , $29.44\次 $ , $39.40\次 $节能,超过最先进的稀疏神经网络架构 SIGMA、SRE 和 Bit Prudent。
更新日期:2022-05-04
down
wechat
bug