当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gretch
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2021-02-10 , DOI: 10.1145/3439803
Anirudh Mohan Kaushik 1 , Gennady Pekhimenko 2 , Hiren Patel 3
Affiliation  

Data-dependent memory accesses (DDAs) pose an important challenge for high-performance graph analytics (GA). This is because such memory accesses do not exhibit enough temporal and spatial locality resulting in low cache performance. Prior efforts that focused on improving the performance of DDAs for GA are not applicable across various GA frameworks. This is because (1) they only focus on one particular graph representation, and (2) they require workload changes to communicate specific information to the hardware for their effective operation. In this work, we propose a hardware-only solution to improving the performance of DDAs for GA across multiple GA frameworks. We present a hardware prefetcher for GA called Gretch, that addresses the above limitations. An important observation we make is that identifying certain DDAs without hardware-software communication is sensitive to the instruction scheduling. A key contribution of this work is a hardware mechanism that activates Gretch to identify DDAs when using either in-order or out-of-order instruction scheduling. Our evaluation shows that Gretch provides an average speedup of 38% over no prefetching, 25% over conventional stride prefetcher, and outperforms prior DDAs prefetchers by 22% with only 1% increase in power consumption when executed on different GA workloads and frameworks.

中文翻译:

格瑞奇

数据相关内存访问 (DDA) 对高性能图形分析 (GA) 提出了重要挑战。这是因为这样的内存访问没有表现出足够的时间和空间局部性,从而导致缓存性能低下。先前专注于改进 GA 的 DDA 性能的努力不适用于各种 GA 框架。这是因为(1)它们只关注一种特定的图形表示,(2)它们需要更改工作负载才能将特定信息传达给硬件以实现有效操作。在这项工作中,我们提出了一种纯硬件解决方案,以提高跨多个 GA 框架的 GA 的 DDA 性能。我们为 GA 提供了一个名为 Gretch 的硬件预取器,它解决了上述限制。我们做出的一个重要观察是,在没有硬件-软件通信的情况下识别某些 DDA 对指令调度很敏感。这项工作的一个关键贡献是一种硬件机制,该机制在使用有序或无序指令调度时激活 Grech 以识别 DDA。我们的评估表明,与不预取相比,Gretch 的平均加速比为 38%,比传统的跨步预取器高 25%,并且在不同的 GA 工作负载和框架上执行时,其性能比之前的 DDA 预取器高出 22%,而功耗仅增加了 1%。
更新日期:2021-02-10
down
wechat
bug