Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks,IEEE Journal of Solid-State Circuits

当前位置： X-MOL 学术 › IEEE J. Solid-State Circuits › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Z-PIM: A Sparsity-Aware Processing-in-Memory Architecture With Fully Variable Weight Bit-Precision for Energy-Efficient Deep Neural Networks
IEEE Journal of Solid-State Circuits ( IF 4.6 ) Pub Date : 2021-01-27 , DOI: 10.1109/jssc.2020.3039206
Ji-Hoon Kim , Juhyoung Lee , Jinsu Lee , Jaehoon Heo , Joo-Young Kim

We present an energy-efficient processing-in-memory (PIM) architecture named Z-PIM that supports both sparsity handling and fully variable bit-precision in weight data for energy-efficient deep neural networks. Z-PIM adopts the bit-serial arithmetic that performs a multiplication bit-by-bit through multiple cycles to reduce the complexity of the operation in a single cycle and to provide flexibility in bit-precision. To this end, it employs a zero-skipping convolution SRAM, which performs in-memory AND operations based on custom 8T-SRAM cells and channel-wise accumulations, and a diagonal accumulation SRAM that performs bit- and spatial-wise accumulation on the channel-wise accumulation results using diagonal logic and adders to produce the final convolution outputs. We propose the hierarchical bitline structure for energy-efficient weight bit pre-charging and computational readout by reducing the parasitic capacitances of the bitlines. Its charge reuse scheme reduces the switching rate by 95.42% for the convolution layers of VGG-16 model. In addition, Z-PIM’s channel-wise data mapping enables sparsity handling by skip-reading the input channels with zero weight. Its read-operation pipelining enabled by a read-sequence scheduling improves the throughput by 66.1%. The Z-PIM chip is fabricated in a 65-nm CMOS process on a 7.568-mm ² die, while it consumes average 5.294-mW power at 1.0-V voltage and 200-MHz frequency. It achieves 0.31–49.12-TOPS/W energy efficiency for convolution operations as the weight sparsity and bit-precision vary from 0.1 to 0.9 and 1 to 16 bit, respectively. For the figure of merit considering input bit-width, weight bit-width, and energy efficiency, the Z-PIM shows more than 2.1 times improvement over the state-of-the-art PIM implementations.

中文翻译：

Z-PIM：具有稀疏感知的内存中处理架构，具有完全可变的权重比特精度，用于节能型深度神经网络

我们提出了一种名为Z-PIM的节能型内存处理（PIM）架构，该架构支持稀疏处理和权重数据中的完全可变位精度，以实现节能型深度神经网络。Z-PIM采用了位串行算法，可以在多个周期中逐位执行乘法运算，以降低单个周期中的运算复杂度并提供位精度的灵活性。为此，它采用了零跳变卷积SRAM，它基于定制的8T-SRAM单元和通道方式的存储执行内存中的AND操作；以及对角线累加SRAM，其在通道上执行按位和空间的累加使用对角线逻辑和加法器进行逐级累加结果，以产生最终的卷积输出。通过减少位线的寄生电容，我们提出了分层位线结构，以实现高能效的权重位预充电和计算读出。对于VGG-16模型的卷积层，其电荷重用方案将转换率降低了95.42％。此外，Z-PIM的通道数据映射通过跳过读取权重为零的输入通道来实现稀疏性处理。通过读取顺序调度实现的读取操作流水线可将吞吐量提高66.1％。Z-PIM芯片以65-nm CMOS工艺在7.568-mm上制造 Z-PIM的通道数据映射通过跳过读取权重为零的输入通道来实现稀疏性处理。通过读取顺序调度实现的读取操作流水线可将吞吐量提高66.1％。Z-PIM芯片以65-nm CMOS工艺在7.568-mm上制造 Z-PIM的通道数据映射通过跳过读取权重为零的输入通道来实现稀疏性处理。通过读取顺序调度实现的读取操作流水线可将吞吐量提高66.1％。Z-PIM芯片以65-nm CMOS工艺在7.568-mm上制造 ^2个芯片死亡，而在1.0V电压和200MHz频率下平均消耗5.294mW功率。卷积运算的权重稀疏性和位精度分别从0.1到0.9和1到16位不等，因此可实现0.31–49.12-TOPS / W的能量效率。对于考虑输入位宽，权重位宽和能效的品质因数，Z-PIM显示出比最新的PIM实现高出2.1倍以上。

更新日期：2021-03-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11