当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches
arXiv - CS - Hardware Architecture Pub Date : 2020-09-24 , DOI: arxiv-2009.11442
Kyle Kuan and Tosiron Adegbija

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired_unused_prefetches---the number of unused prefetches expired by the reduced retention time STTRAM cache---can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24% and 24.59%, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50% and 3.59%, respectively, and reduced the hardware overhead by 54.55%

中文翻译:

STTRAM L1缓存的运行时自适应预取研究

自旋转移扭矩 RAM (STTRAM) 是片上高速缓存中 SRAM 的一种有前途的替代品,因为它具有多种优势。这些优点包括非易失性、低泄漏、高集成密度和 CMOS 兼容性。先前的研究表明,由于 STTRAM 写入能量和延迟时间较短,因此根据运行时应用程序需求放宽和调整 STTRAM 保留时间可以显着降低整体缓存能量,而不会产生显着的延迟开销。在本文中,作为跨 STTRAM 缓存层次结构高效预取的第一步,我们研究了减少保留的 STTRAM L1 缓存中的预取。使用 SPEC CPU 2017 基准测试,我们分析了不同应用程序在不同 STTRAM 缓存保留时间中不同预取距离对能量和延迟的影响。我们表明,expired_unused_prefetches---减少保留时间STTRAM缓存过期的未使用预取的数量---可以准确地确定能量消耗和访问延迟的最佳保留时间。这一新指标还可以深入了解内存带宽消耗和预取精度的最佳预取距离。基于我们的分析和见解,我们提出了预取感知保留时间调整 (PART) 和基于保留时间的预取控制 (RPC)。与基础 STTRAM 缓存相比,PART 和 RPC 共同将平均缓存能量和延迟分别降低了 22.24% 和 24.59%。当基础架构通过最先进的近端预取节流 (NST) 进行增强时,PART+RPC 分别将平均缓存能量和延迟降低了 3.50% 和 3.59%,
更新日期:2020-09-25
down
wechat
bug