当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Early Address Prediction
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2021-06-08 , DOI: 10.1145/3458883
Ricardo Alves 1 , Stefanos Kaxiras 1 , David Black-Schaffer 1
Affiliation  

Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or L0 caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead. In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the register file beyond the instruction’s lifetime to increase temporal reuse, and coalesce prefetch requests to achieve spatial reuse. Further, we show that we can use the existing memory order violation detection hardware to validate prefetches and data forwards without additional overhead. Our design achieves the performance of existing pipeline prefetching while also forwarding 32% of the loads from the register file (compared to 15% in state-of-the-art register sharing), delivering a 16% reduction in L1 dynamic energy (1.6% total processor energy), with an area overhead of less than 0.5%.

中文翻译:

早期地址预测

以低能源和存储开销实现低负载使用延迟对于性能至关重要。现有技术要么预取到流水线(通过地址预测和验证),要么在流水线中提供数据重用(通过寄存器共享或 L0 缓存)。这些技术在延迟、重用和开销之间提供了一系列权衡。在这项工作中,我们提出了一种流水线预取技术,该技术通过向寄存器文件添加地址标签来实现最先进的性能和数据重用,而无需额外的数据存储、数据移动或验证开销。我们添加寄存器文件标签允许我们转发(重用)从寄存器文件加载数据而无需额外的数据移动,在指令的生命周期之外保持寄存器文件中的数据活动以增加时间重用,并合并预取请求以实现空间重用。此外,我们展示了我们可以使用现有的内存顺序违规检测硬件来验证预取和数据转发,而无需额外的开销。我们的设计实现了现有流水线预取的性能,同时还转发了来自寄存器文件的 32% 的负载(与最先进的寄存器共享中的 15% 相比),L1 动态能量减少了 16%(1.6%总处理器能量),面积开销小于 0.5%。
更新日期:2021-06-08
down
wechat
bug