当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain
arXiv - CS - Hardware Architecture Pub Date : 2020-05-03 , DOI: arxiv-2005.01206
Weitao Li, Pengfei Xu, Yang Zhao, Haitong Li, Yuan Xie, Yingyan Lin

Resistive-random-access-memory (ReRAM) based processing-in-memory (R$^2$PIM) accelerators show promise in bridging the gap between Internet of Thing devices' constrained resources and Convolutional/Deep Neural Networks' (CNNs/DNNs') prohibitive energy cost. Specifically, R$^2$PIM accelerators enhance energy efficiency by eliminating the cost of weight movements and improving the computational density through ReRAM's high density. However, the energy efficiency is still limited by the dominant energy cost of input and partial sum (Psum) movements and the cost of digital-to-analog (D/A) and analog-to-digital (A/D) interfaces. In this work, we identify three energy-saving opportunities in R$^2$PIM accelerators: analog data locality, time-domain interfacing, and input access reduction, and propose an innovative R$^2$PIM accelerator called TIMELY, with three key contributions: (1) TIMELY adopts analog local buffers (ALBs) within ReRAM crossbars to greatly enhance the data locality, minimizing the energy overheads of both input and Psum movements; (2) TIMELY largely reduces the energy of each single D/A (and A/D) conversion and the total number of conversions by using time-domain interfaces (TDIs) and the employed ALBs, respectively; (3) we develop an only-once input read (O$^2$IR) mapping method to further decrease the energy of input accesses and the number of D/A conversions. The evaluation with more than 10 CNN/DNN models and various chip configurations shows that, TIMELY outperforms the baseline R$^2$PIM accelerator, PRIME, by one order of magnitude in energy efficiency while maintaining better computational density (up to 31.2$\times$) and throughput (up to 736.6$\times$). Furthermore, comprehensive studies are performed to evaluate the effectiveness of the proposed ALB, TDI, and O$^2$IR innovations in terms of energy savings and area reduction.

中文翻译:

及时:将 PIM 加速器中的数据移动和接口推向本地和时域

基于电阻随机存取存储器 (ReRAM) 的内存处理 (R$^2$PIM) 加速器有望弥合物联网设备的受限资源与卷积/深度神经网络 (CNNs/DNNs) 之间的差距') 过高的能源成本。具体来说,R$^2$PIM 加速器通过消除重量移动的成本并通过 ReRAM 的高密度提高计算密度来提高能源效率。然而,能源效率仍然受到输入和部分和 (Psum) 运动的主要能源成本以及数模 (D/A) 和模数 (A/D) 接口的成本的限制。在这项工作中,我们确定了 R$^2$PIM 加速器的三个节能机会:模拟数据局部性、时域接口和输入访问减少,并提出了一种名为 TIMELY 的创新 R$^2$PIM 加速器,具有三个关键贡献:(1) TIMELY 在 ReRAM 交叉开关中采用模拟本地缓冲区 (ALB) 以极大地增强数据局部性,最大限度地减少输入和 Psum 运动的能量开销; (2) TIMELY 通过使用时域接口 (TDI) 和采用的 ALB,分别大大降低了每个单次 D/A(和 A/D)转换的能量和转换总数;(3) 我们开发了一种一次性输入读取 (O$^2$IR) 映射方法,以进一步降低输入访问的能量和 D/A 转换的次数。对超过 10 个 CNN/DNN 模型和各种芯片配置的评估表明,TIMELY 优于基线 R$^2$PIM 加速器 PRIME,能源效率提高一个数量级,同时保持更好的计算密度(高达 31.2$\times$)和吞吐量(高达 736.6$\times$)。此外,还进行了综合研究,以评估提议的 ALB、TDI 和 O$^2$IR 创新在节能和减少面积方面的有效性。
更新日期:2020-05-05
down
wechat
bug