Reuse Cache for Heterogeneous CPU-GPU Systems,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reuse Cache for Heterogeneous CPU-GPU Systems
arXiv - CS - Hardware Architecture Pub Date : 2021-07-28 , DOI: arxiv-2107.13649
Tejas Shah, Bobbi Yogatama, Kyle Roarty, Rami Dahman

It is generally observed that the fraction of live lines in shared last-level caches (SLLC) is very small for chip multiprocessors (CMPs). This can be tackled using promotion-based replacement policies like re-reference interval prediction (RRIP) instead of LRU, dead-block predictors, or reuse-based cache allocation schemes. In GPU systems, similar LLC issues are alleviated using various cache bypassing techniques. These issues are worsened in heterogeneous CPU-GPU systems because the two processors have different data access patterns and frequencies. GPUs generally work on streaming data, but have many more threads accessing memory as compared to CPUs. As such, most traditional cache replacement and allocation policies prove ineffective due to the higher number of cache accesses in GPU applications, resulting in higher allocation for GPU cache lines, despite their minimal reuse. In this work, we implement the Reuse Cache approach for heterogeneous CPU-GPU systems. The reuse cache is a decoupled tag/data SLLC which is designed to only store the data that is being accessed more than once. This design is based on the observation that most of the cache lines in the LLC are stored but do not get reused before being replaced. We find that the reuse cache achieves within 0.5% of the IPC gains of a statically partitioned LLC, while decreasing the area cost of the LLC by an average of 40%.

中文翻译：

为异构 CPU-GPU 系统重用缓存

通常观察到，对于芯片多处理器 (CMP) 而言，共享末级缓存 (SLLC) 中活动线的比例非常小。这可以使用基于提升的替换策略来解决，例如重新引用间隔预测 (RRIP) 而不是 LRU、死块预测器或基于重用的缓存分配方案。在 GPU 系统中，使用各种缓存绕过技术可以缓解类似的 LLC 问题。这些问题在异构 CPU-GPU 系统中更加严重，因为两个处理器具有不同的数据访问模式和频率。GPU 通常处理流数据，但与 CPU 相比，它有更多的线程访问内存。因此，大多数传统的缓存替换和分配策略被证明是无效的，因为 GPU 应用程序中的缓存访问次数较多，尽管重复使用最少，但仍会为 GPU 缓存线分配更高的分配。在这项工作中，我们为异构 CPU-GPU 系统实现了重用缓存方法。重用缓存是一个分离的标签/数据 SLLC，它被设计为只存储被多次访问的数据。此设计基于以下观察：LLC 中的大多数缓存行已存储，但在被替换之前不会被重用。我们发现重用缓存实现了静态分区 LLC 的 IPC 增益的 0.5% 以内，同时将 LLC 的面积成本平均降低了 40%。此设计基于以下观察：LLC 中的大多数缓存行已存储，但在被替换之前不会被重用。我们发现重用缓存实现了静态分区 LLC 的 IPC 增益的 0.5% 以内，同时将 LLC 的面积成本平均降低了 40%。此设计基于以下观察：LLC 中的大多数缓存行已存储，但在被替换之前不会被重用。我们发现重用缓存实现了静态分区 LLC 的 IPC 增益的 0.5% 以内，同时将 LLC 的面积成本平均降低了 40%。

更新日期：2021-07-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文