当前位置: X-MOL 学术IEEE Trans. Circuits Syst. I Regul. Pap. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators
IEEE Transactions on Circuits and Systems I: Regular Papers ( IF 5.1 ) Pub Date : 2021-04-02 , DOI: 10.1109/tcsi.2021.3064033
Shamma Nasrin , Diaa Badawi , Ahmet Enis Cetin , Wilfred Gomes , Amit Ranjan Trivedi

We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on $\ell _{1}$ norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn’t require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our $8\times 62$ SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our $8\times 30$ SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configurations utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory’s efficiency benefits broadly translate to the system-level.

中文翻译:

MF-Net:使用存储器内含的数据转换和无乘法运算符进行多位精确推理的内存中计算SRAM

我们提出了一种共同设计的方法 内存计算深度神经网络(DNN)的推理。我们使用基于 $ \ ell _ {1} $ 规范以及共同适应的处理数组和计算流程。使用该方法,我们克服了当前的许多缺陷艺术SRAM DNN处理,例如在每个运行的SRAM行/列上都需要数模转换器(DAC),对高精度模数转换器(ADC)的需求,对多位精度的有限支持权重和有限的矢量比例并行性。我们共同适应的实现无缝扩展到多位精度权重,不需要DAC,并且很容易扩展到更高的矢量比例并行度。我们还提出了一种SRAM浸入式逐次逼近ADC(SA-ADC),其中我们将SRAM阵列位线的寄生电容用作电容DAC。由于SA-ADC的主要区域开销来自其容性DAC,因此,通过利用SRAM阵列的固有寄生效应,我们的方法允许在SRAM内SA-ADC的低区域实现。我们的 $ 8 /次62 $ 需要5位ADC的SRAM宏通过45 nm CMOS上的8位输入/权重处理,每秒可实现约105 tera运算/瓦(TOPS / W)。我们的 $ 8 /次30 $ 需要4位ADC的SRAM宏达到〜84 TOPS / W。需要较低ADC精度的SRAM宏对过程可变性的容忍度更高,但TOPS / W也较低。我们评估了针对MNIST,CIFAR10和CIFAR100数据集的拟议网络的准确性和性能。我们选择了一种网络配置,该配置自适应地混合了无乘法运算符和常规运算符。网络配置利用无乘法运算符进行总数的85%以上的运算。所选配置的MNIST精度为98.6%,CIFAR10的精度为90.2%,CIFAR100的精度为66.9%。由于考虑的配置中的大多数操作都基于建议的SRAM宏,因此我们的内存计算效率优势广泛地转化为系统级。
更新日期:2021-04-20
down
wechat
bug