A perceptron-based replication scheme for managing the shared last level cache,Microprocessors and Microsystems

当前位置： X-MOL 学术 › Microprocess. Microsyst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A perceptron-based replication scheme for managing the shared last level cache
Microprocessors and Microsystems ( IF 1.9 ) Pub Date : 2021-06-29 , DOI: 10.1016/j.micpro.2021.104310
Qianqian Wu , Zhenzhou Ji

The shared last level cache (SLLC), which provides large effective cache capacity, is widely adopted in modern chip multiprocessors (CMPs). But, long on-chip access latency in the SLLC is a key problem that hurts system performance. Replication is an effective way to relieve this problem through storing a replica of L1 victims in the near local LLC slice. However, previous replication schemes either blindly create replicas based on no feature of cache blocks or select replicas based on a single feature (such as data type, access count, etc.), which will affect the replication accuracy and limit the system performance improvements. In this paper, according to the successful application of machine learning (ML) in the field of computer architecture optimization in recent years, we develop a novel perceptron-based replication scheme (PBR) for effectively managing the SLLC in CMPs. Unlike existing single-feature-based schemes, this scheme effectively combines four features related to the reuse behavior of L1 victims, which are address (Addr), program counter (PC), data type (DT), and access count (AC), through perceptron to facilitate the accuracy of replica selection. Experimental results show that compared with the two previously proposed single-feature-based replication schemes: ASR and LADR, PBR decreases the execution time by 6.59% and 18.27%, and reduces the network traffic by 10.35% and 13.18% respectively with negligible energy consumption, hardware and area overhead.

中文翻译：

一种用于管理共享末级缓存的基于感知器的复制方案

共享末级缓存 (SLLC) 提供大的有效缓存容量，在现代芯片多处理器 (CMP) 中被广泛采用。但是，SLLC 中较长的片上访问延迟是影响系统性能的关键问题。通过在近本地 LLC 切片中存储 L1 受害者的副本，复制是缓解此问题的有效方法。然而，以往的复制方案要么基于缓存块的无特征盲目创建副本，要么基于单一特征（如数据类型、访问次数等）选择副本，这会影响复制精度并限制系统性能的提升。本文根据近年来机器学习（ML）在计算机架构优化领域的成功应用，我们开发了一种新颖的基于感知器的复制方案（PBR），用于有效管理 CMP 中的 SLLC。与现有的基于单一特征的方案不同，该方案有效地结合了与 L1 受害者重用行为相关的四个特征，分别是地址（Addr）、程序计数器（PC）、数据类型（DT）和访问计数（AC），通过感知器来促进副本选择的准确性。实验结果表明，与之前提出的两种基于单特征的复制方案：ASR和LADR相比，PBR将执行时间分别减少了6.59%和18.27%，网络流量分别减少了10.35%和13.18%，而能耗几乎可以忽略不计、硬件和区域开销。该方案通过感知器有效地结合了与L1受害者重用行为相关的四个特征，地址（Addr）、程序计数器（PC）、数据类型（DT）和访问计数（AC），以促进副本选择的准确性. 实验结果表明，与之前提出的两种基于单特征的复制方案：ASR和LADR相比，PBR将执行时间分别减少了6.59%和18.27%，网络流量分别减少了10.35%和13.18%，而能耗几乎可以忽略不计、硬件和区域开销。该方案通过感知器有效地结合了与L1受害者重用行为相关的四个特征，地址（Addr）、程序计数器（PC）、数据类型（DT）和访问计数（AC），以促进副本选择的准确性. 实验结果表明，与之前提出的两种基于单特征的复制方案：ASR和LADR相比，PBR将执行时间分别减少了6.59%和18.27%，网络流量分别减少了10.35%和13.18%，而能耗几乎可以忽略不计、硬件和区域开销。

更新日期：2021-07-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11