Reuse Distance-based Copy-backs of Clean Cache Lines to Lower-level Caches,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reuse Distance-based Copy-backs of Clean Cache Lines to Lower-level Caches
arXiv - CS - Hardware Architecture Pub Date : 2021-05-30 , DOI: arxiv-2105.14442
Rui Wang, Chundong Wang, Chongnan Ye

Cache plays a critical role in reducing the performance gap between CPU and main memory. A modern multi-core CPU generally employs a multi-level hierarchy of caches, through which the most recently and frequently used data are maintained in each core's local private caches while all cores share the last-level cache (LLC). For inclusive caches, clean cache lines replaced in higher-level caches are not necessarily copied back to lower levels, as the inclusiveness implies their existences in lower levels. For exclusive and non-inclusive caches that are widely utilized by Intel, AMD, and ARM today, either indiscriminately copying back all or none of replaced clean cache lines to lower levels raises no violation to exclusiveness and non-inclusiveness definitions. We have conducted a quantitative study and found that, copying back all or none of clean cache lines to lower-level cache of exclusive caches entails suboptimal performance. The reason is that only a part of cache lines would be reused and others turn to be dead in a long run. This observation motivates us to selectively copy back some clean cache lines to LLC in an architecture of exclusive or non-inclusive caches. We revisit the concept of reuse distance of cache lines. In a nutshell, a clean cache line with a shorter reuse distance is copied back to lower-level cache as it is likely to be re-referenced in the near future, while cache lines with much longer reuse distances would be discarded or sent to memory if they are dirty. We have implemented and evaluated our proposal with non-volatile (STT-MRAM) LLC. Experimental results with gem5 and SPEC CPU 2017 benchmarks show that on average our proposal yields up to 12.8% higher throughput of IPC (instructions per cycle) than the least-recently-used (LRU) replacement policy with copying back all clean cache lines for STT-MRAM LLC.

中文翻译：

将干净缓存行的基于距离的复制回重用于较低级别的缓存

缓存在减少 CPU 和主内存之间的性能差距方面起着至关重要的作用。现代多核 CPU 通常采用多级缓存层次结构，通过该层次结构将最近和最常用的数据保存在每个内核的本地私有缓存中，而所有内核共享最后一级缓存 (LLC)。对于包含缓存，在较高级别缓存中替换的干净缓存行不一定会复制回较低级别，因为包含性意味着它们存在于较低级别。对于当今英特尔、AMD 和 ARM 广泛使用的独占和非包含缓存，不加选择地将所有替换的干净缓存行或不复制回较低级别不会违反独占和非包含定义。我们进行了定量研究，发现，将所有干净的高速缓存行全部或不复制回独占高速缓存的低级高速缓存需要次优的性能。原因是只有一部分缓存行会被重用，而其他缓存行从长远来看就会死掉。这一观察促使我们在独占或非包含缓存的架构中选择性地将一些干净的缓存行复制回 LLC。我们重新审视缓存行的重用距离的概念。简而言之，具有较短重用距离的干净缓存行被复制回较低级别的缓存，因为它很可能在不久的将来被重新引用，而具有较长重用距离的缓存行将被丢弃或发送到内存如果它们很脏。我们已经通过非易失性 (STT-MRAM) LLC 实施并评估了我们的提案。

更新日期：2021-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>