当前位置: X-MOL 学术IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CASH-RAM: Enabling In-Memory Computations for Edge Inference using Charge Accumulation and Sharing in Standard 8T-SRAM Arrays
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 4.6 ) Pub Date : 2020-09-01 , DOI: 10.1109/jetcas.2020.3014250
Amogh Agrawal , Adarsh Kosta , Sangamesh Kodge , Dong Eun Kim , Kaushik Roy

Machine Learning (ML) workloads being memory- and compute-intensive, consume large amounts of power running on conventional computing systems, restricting their implementations to large-scale data centers. Transferring large amounts of data from the edge devices to the data centers is not only energy expensive, but sometimes undesirable in security-critical applications. Thus, there is a need for building domain-specific hardware primitives for energy-efficient ML processing at the edge. One such approach - in-memory computing, eliminates frequent and unnecessary data-transfers between the memory and the compute units, by directly computing the data where it is stored. However, the analog nature of computations introduces non-idealities, which degrades the overall accuracy of neural networks. In this paper, we propose an in-memory computing primitive for accelerating dot-products within standard 8T-SRAM caches, using charge-sharing. The inherent parasitic capacitance of the bitlines and sourcelines is used for accumulating analog voltages, which can be sensed for an approximate dot product. The charge sharing approach involves a self-compensation technique which reduces the effects of non-idealities, thereby reducing the errors. Our results for ternary weight neural networks show that using the proposed compensation approaches, the accuracy degradation is within 1% and 5% of the baseline accuracy, for the MNIST and CIFAR-10 dataset, respectively, with an energy-delay product improvement of $38\times $ over the standard von-Neumann computing system. We believe that this work can be used in conjunction with existing mitigation techniques, such as re-training approaches, to further enhance system performance.

中文翻译:

CASH-RAM:使用标准 8T-SRAM 阵列中的电荷累积和共享实现边缘推理的内存计算

机器学习 (ML) 工作负载是内存和计算密集型的,消耗大量运行在传统计算系统上的电力,将它们的实施限制在大型数据中心。将大量数据从边缘设备传输到数据中心不仅能源成本高昂,而且有时在安全关键应用程序中也是不可取的。因此,需要为边缘的节能 ML 处理构建特定领域的硬件原语。一种这样的方法 - 内存计算,通过直接计算存储数据的位置,消除了内存和计算单元之间频繁和不必要的数据传输。然而,计算的模拟性质引入了非理想性,这会降低神经网络的整体准确性。在本文中,我们提出了一种内存计算原语,用于在标准 8T-SRAM 缓存中加速点积,使用电荷共享。位线和源线的固有寄生电容用于累积模拟电压,可以检测到近似点积。电荷共享方法涉及一种自我补偿技术,该技术减少了非理想性的影响,从而减少了误差。我们对三元权重神经网络的结果表明,使用所提出的补偿方法,对于 MNIST 和 CIFAR-10 数据集,精度下降分别在基线精度的 1% 和 5% 以内,能量延迟乘积提高了 38 美元\times $ 超过标准的冯诺依曼计算系统。我们相信这项工作可以与现有的缓解技术结合使用,
更新日期:2020-09-01
down
wechat
bug