CASH-RAM: Enabling In-Memory Computations for Edge Inference Using Charge Accumulation and Sharing in Standard 8T-SRAM Arrays,IEEE Journal on Emerging and Selected Topics in Circuits and Systems

当前位置： X-MOL 学术 › IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CASH-RAM: Enabling In-Memory Computations for Edge Inference Using Charge Accumulation and Sharing in Standard 8T-SRAM Arrays
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 2020-08-04 , DOI: 10.1109/jetcas.2020.3014250
Amogh Agrawal , Adarsh Kosta , Sangamesh Kodge , Dong Eun Kim , Kaushik Roy

Machine Learning (ML) workloads being memoryand compute-intensive, consume large amounts of power running on conventional computing systems, restricting their implementations to large-scale data centers. Transferring large amounts of data from the edge devices to the data centers is not only energy expensive, but sometimes undesirable in security-critical applications. Thus, there is a need for building domain-specific hardware primitives for energy-efficient ML processing at the edge. One such approach - in-memory computing, eliminates frequent and unnecessary data-transfers between the memory and the compute units, by directly computing the data where it is stored. However, the analog nature of computations introduces non-idealities, which degrades the overall accuracy of neural networks. In this paper, we propose an in-memory computing primitive for accelerating dot-products within standard 8T-SRAM caches, using charge-sharing. The inherent parasitic capacitance of the bitlines and sourcelines is used for accumulating analog voltages, which can be sensed for an approximate dot product. The charge sharing approach involves a self-compensation technique which reduces the effects of non-idealities, thereby reducing the errors. Our results for ternary weight neural networks show that using the proposed compensation approaches, the accuracy degradation is within 1% and 5% of the baseline accuracy, for the MNIST and CIFAR-10 dataset, respectively, with an energy-delay product improvement of 38× over the standard von-Neumann computing system. We believe that this work can be used in conjunction with existing mitigation techniques, such as re-training approaches, to further enhance system performance.

中文翻译：

CASH-RAM：使用标准 8T-SRAM 阵列中的电荷累积和共享来实现边缘推理的内存计算

机器学习 (ML) 工作负载属于内存和计算密集型，在传统计算系统上运行时会消耗大量电量，从而限制了其在大型数据中心的实施。将大量数据从边缘设备传输到数据中心不仅能源昂贵，而且有时在安全关键型应用中也是不可取的。因此，需要构建特定领域的硬件原语，以在边缘进行节能的机器学习处理。其中一种方法 - 内存计算，通过直接计算存储的数据，消除了内存和计算单元之间频繁且不必要的数据传输。然而，计算的模拟性质引入了非理想性，这降低了神经网络的整体精度。在本文中，我们提出了一种内存计算原语，用于使用电荷共享来加速标准 8T-SRAM 缓存内的点积。位线和源极线的固有寄生电容用于累积模拟电压，可以通过近似点积来感测该模拟电压。电荷共享方法涉及一种自补偿技术，该技术可以减少非理想的影响，从而减少误差。我们的三元权重神经网络结果表明，使用所提出的补偿方法，对于 MNIST 和 CIFAR-10 数据集，精度下降分别在基线精度的 1% 和 5% 以内，能量延迟乘积改进了 38 × 超过标准冯诺依曼计算系统。我们相信这项工作可以与现有的缓解技术（例如再训练方法）结合使用，以进一步提高系统性能。

更新日期：2020-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11