当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fetch-Directed Instruction Prefetching Revisited
arXiv - CS - Hardware Architecture Pub Date : 2020-06-24 , DOI: arxiv-2006.13547
Truls Asheim, Rakesh Kumar, Boris Grot

Prior work has observed that fetch-directed prefetching (FDIP) is highly effective at covering instruction cache misses. The key to FDIP's effectiveness is having a sufficiently large BTB to accommodate the application's branch working set. In this work, we introduce several optimizations that significantly extend the reach of the BTB within the available storage budget. Our optimizations target nearly every source of storage overhead in each BTB entry; namely, the tag, target address, and size fields. We observe that while most dynamic branch instances have short offsets, a large number of branches has longer offsets or requires the use of full target addresses. Based on this insight, we break-up the BTB into multiple smaller BTBs, each storing offsets of different length. This enables a dramatic reduction in storage for target addresses. We further compress tags to 16 bits and avoid the use of the basic-block-oriented BTB advocated in prior FDIP variants. The latter optimization eliminates the need to store the basic block size in each BTB entry. Our final design, called FDIP-X, uses an ensemble of 4 BTBs and always outperforms conventional FDIP with a unified basic-block-oriented BTB for equal storage budgets.

中文翻译:

重新审视取指指令预取

先前的工作已经观察到取指预取 (FDIP) 在覆盖指令缓存未命中方面非常有效。FDIP 有效性的关键是拥有足够大的 BTB 来容纳应用程序的分支工作集。在这项工作中,我们引入了多项优化,可在可用存储预算内显着扩展 BTB 的覆盖范围。我们的优化几乎针对每个 BTB 条目中的所有存储开销来源;即标签、目标地址和大小字段。我们观察到,虽然大多数动态分支实例具有较短的偏移量,但大量分支具有较长的偏移量或需要使用完整的目标地址。基于这一见解,我们将 BTB 分解为多个较小的 BTB,每个 BTB 存储不同长度的偏移量。这可以显着减少目标地址的存储空间。我们进一步将标签压缩为 16 位,并避免使用先前 FDIP 变体中提倡的面向基本块的 BTB。后一种优化消除了在每个 BTB 条目中存储基本块大小的需要。我们的最终设计称为 FDIP-X,它使用 4 个 BTB 的集合,并且在相同的存储预算下始终优于具有统一的面向基本块的 BTB 的传统 FDIP。
更新日期:2020-06-25
down
wechat
bug