A Reconfigurable 16Kb AND8T SRAM Macro With Improved Linearity for Multibit Compute-In Memory of Artificial Intelligence Edge Devices,IEEE Journal on Emerging and Selected Topics in Circuits and Systems

当前位置： X-MOL 学术 › IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Reconfigurable 16Kb AND8T SRAM Macro With Improved Linearity for Multibit Compute-In Memory of Artificial Intelligence Edge Devices
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 4-18-2022 , DOI: 10.1109/jetcas.2022.3168571
Vishal Sharma ₁ , Ju-Eon Kim ₁ , Hyunjoon Kim ₁ , Lu Lu ₁ , Tony Tae-Hyoung Kim ₁

Affiliation

Compute-in Memory (CIM) has been a promising candidate to perform the energy-efficient multiply-and-accumulate (MAC) operations of the modern Artificial Intelligence (AI) edge devices. This work proposes a multi-bit precision (4b input, 4b weight, and 4b output) 128 × 128 SRAM CIM architecture. The 4b input is implemented using the voltage-scaling and charge-sharing-based scheme. To achieve efficient computation with improved linearity, a novel AND-logic-based 8T SRAM cell (AND8T) is proposed. To address the non-idealities of analog voltage or current-based operations, the proposed AND8T employs the charge-domain-based computation by overlaying a metal-oxide-metal capacitor (MOM cap) with no area overhead. The proposed AND8T mitigates the linearity issue of MAC operations which is highly desirable for the reliable operation of complex neural networks (CNNs). The proposed 16Kb macro asserts 128 inputs in parallel and processes a 128 4b dot-product in a single cycle for the array column (a single neuron). The macro can also be reconfigured for the 64 or 32 4b parallel inputs based on the need of CNN models. The AND8T SRAM macro is fabricated in a 65nm node and achieves an energy efficiency of 301.08 TOPS/W for 16 parallel neurons output, with 128 4b MAC operations at 10MHz clock frequency and 1V supply. The implemented macro supports up to 100MHz of clock frequency and occupies 0.124mm2 of chip area while achieving the 96.05% and 87% classification accuracy for MNIST and CIFAR-10 datasets.

中文翻译：

具有改进线性度的可重构 16Kb AND8T SRAM 宏，适用于人工智能边缘设备的多位计算内存

内存计算 (CIM) 一直是执行现代人工智能 (AI) 边缘设备的节能乘法累加 (MAC) 运算的有前途的候选者。这项工作提出了一种多位精度（4b 输入、4b 权重和 4b 输出）128 × 128 SRAM CIM 架构。 4b 输入是使用电压缩放和基于电荷共享的方案实现的。为了实现高效计算并提高线性度，提出了一种新颖的基于 AND 逻辑的 8T SRAM 单元 (AND8T)。为了解决基于模拟电压或电流的操作的非理想性，所提出的 AND8T 通过覆盖金属-氧化物-金属电容器（MOM 电容）来采用基于电荷域的计算，而无需面积开销。所提出的 AND8T 缓解了 MAC 运算的线性问题，这对于复杂神经网络 (CNN) 的可靠运行来说是非常理想的。建议的 16Kb 宏并行断言 128 个输入，并在阵列列（单个神经元）的单个周期内处理 128 个 4b 点积。该宏还可以根据 CNN 模型的需要重新配置为 64 或 32 4b 并行输入。 AND8T SRAM 宏采用 65nm 节点制造，16 个并行神经元输出的能效达到 301.08 TOPS/W，在 10MHz 时钟频率和 1V 电源下进行 128 4b MAC 操作。所实现的宏支持高达 100MHz 的时钟频率，占用 0.124mm2 的芯片面积，同时对 MNIST 和 CIFAR-10 数据集实现 96.05% 和 87% 的分类准确率。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11