当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Refresh Triggered Computation
ACM Transactions on Architecture and Code Optimization ( IF 1.5 ) Pub Date : 2020-12-30 , DOI: 10.1145/3417708
Syed M. A. H. Jafri 1 , Hasan Hassan 2 , Ahmed Hemani 1 , Onur Mutlu 2
Affiliation  

To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve the energy efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead , which is estimated to consume 40% of the total DRAM energy. In this article, we propose a new mechanism, Refresh Triggered Computation (RTC) , that exploits the memory access patterns of CNN applications to reduce the number of refresh operations . RTC uses two major techniques to mitigate the refresh overhead. First, Refresh Triggered Transfer (RTT) is based on our new observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read/write accesses of the application inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second, Partial Array Auto-Refresh (PAAR) eliminates the refresh operations to DRAM regions that do not store any data. We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16 Gb DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs shows that RTC reduces average DRAM energy consumption by 24.4% and 61.3% for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for Face Recognition and Bayesian Confidence Propagation Neural Network (BCPNN) , respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time.

中文翻译:

刷新触发计算

为了在能量受限的嵌入式系统中使用卷积神经网络 (CNN),CNN 实现的高能效至关重要。许多最近的研究提出了具有自定义计算单元的 CNN 加速器架构,这些架构试图通过最小化来自基于 DRAM 的主存储器的数据传输来提高 CNN 的能源效率和性能。然而,在这些架构中,平均而言,DRAM 仍占系统总能耗的一半。DRAM高能耗的一个关键因素是刷新开销,估计会消耗 DRAM 总能量的 40%。在本文中,我们提出了一种新机制,刷新触发计算 (RTC),它利用 CNN 应用程序的内存访问模式来减少刷新操作. RTC 使用两种主要技术来减轻刷新开销。第一的,刷新触发传输 (RTT)是基于我们的新的观察到 CNN 应用程序以可预测和重复的方式访问大部分 DRAM。因此,应用程序的读/写访问固有地刷新 DRAM,因此可以跳过很大一部分刷新操作。第二,部分阵列自动刷新 (PAAR)消除了对不存储任何数据的 DRAM 区域的刷新操作。我们提出了三种 RTC 设计(min-RTC、mid-RTC 和 full-RTC),每种设计都需要在 DRAM 子系统的定制方面具有不同程度的积极性。我们所有的设计都有很小的开销。即使是最激进的 RTC 设计(即全 RTC)在 16国标DRAM 芯片,对于更密集的芯片可以有更少的开销。我们对六个知名 CNN 的实验评估表明,对于最不激进和最激进的 RTC 实施,RTC 分别将平均 DRAM 能耗降低了 24.4% 和 61.3%。除了 CNN,我们还在其他领域的三个工作负载上评估了我们的 RTC 机制。我们表明,RTC 为 DRAM 节省了 31.9% 和 16.9% 的能量人脸识别贝叶斯置信传播神经网络 (BCPNN), 分别。我们相信 RTC 可以应用于内存访问模式在足够长的时间内保持可预测的其他应用程序。
更新日期:2020-12-30
down
wechat
bug