当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Software Prefetching for Indirect Memory Accesses
ACM Transactions on Computer Systems ( IF 2.0 ) Pub Date : 2019-06-18 , DOI: 10.1145/3319393
Sam Ainsworth 1 , Timothy M. Jones 1
Affiliation  

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. This article develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special class of irregular memory accesses often seen in high-performance workloads. We evaluate this across a wide set of systems, all of which gain benefit from the technique. We then evaluate the extent to which good prefetch instructions are architecture dependent and the class of programs that are particularly amenable. Across a set of memory-bound benchmarks, our automated pass achieves average speedups of 1.3× for an Intel Haswell processor, 1.1× for both an ARM Cortex-A57 and Qualcomm Kryo, 1.2× for a Cortex-72 and an Intel Kaby Lake, and 1.35× for an Intel Xeon Phi Knight’s Landing, each of which is an out-of-order core, and performance improvements of 2.1× and 2.7× for the in-order ARM Cortex-A53 and first generation Intel Xeon Phi.

中文翻译:

间接内存访问的软件预取

许多现代数据处理和 HPC 工作负载都受到严重的内存延迟限制。解决这个问题的一个诱人提议是软件预取,其中使用特殊的非阻塞加载将数据在需要之前带入缓存层次结构。然而,这些很难插入以有效提高性能,并且自动插入的技术目前受到限制。本文开发了一种新颖的编译器通道,以自动为间接内存访问生成软件预取,这是在高性能工作负载中常见的一类特殊的不规则内存访问。我们在一系列广泛的系统中对此进行了评估,所有这些系统都从该技术中受益。然后,我们评估良好的预取指令在多大程度上依赖于体系结构以及特别适合的程序类别。
更新日期:2019-06-18
down
wechat
bug