Automatic Sublining for Efficient Sparse Memory Accesses,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Sublining for Efficient Sparse Memory Accesses
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2021-05-10 , DOI: 10.1145/3452141
Wim Heirman ₁ , Stijn Eyerman ₁ , Kristof Du Bois ₁ , Ibrahim Hur ₂

Affiliation

Sparse memory accesses, which are scattered accesses to single elements of a large data structure, are a challenge for current processor architectures. Their lack of spatial and temporal locality and their irregularity makes caches and traditional stream prefetchers useless. Furthermore, performing standard caching and prefetching on sparse accesses wastes precious memory bandwidth and thrashes caches, deteriorating performance for regular accesses. Bypassing prefetchers and caches for sparse accesses, and fetching only a single element (e.g., 8 B) from main memory (subline access), can solve these issues. Deciding which accesses to handle as sparse accesses and which as regular cached accesses, is a challenging task, with a large potential impact on performance. Not only is performance reduced by treating sparse accesses as regular accesses, not caching accesses that do have locality also negatively impacts performance by significantly increasing their latency and bandwidth consumption. Furthermore, this decision depends on the dynamic environment, such as input set characteristics and system load, making a static decision by the programmer or compiler suboptimal. We propose the Instruction Spatial Locality Estimator ( ISLE ), a hardware detector that finds instructions that access isolated words in a sea of unused data. These sparse accesses are dynamically converted into uncached subline accesses, while keeping regular accesses cached. ISLE does not require modifying source code or binaries, and adapts automatically to a changing environment (input data, available bandwidth, etc.). We apply ISLE to a graph analytics processor running sparse graph workloads, and show that ISLE outperforms the performance of no subline accesses, manual sublining, and prior work on detecting sparse accesses.

中文翻译：

高效稀疏内存访问的自动下划线

稀疏内存访问是对大型数据结构的单个元素的分散访问，是当前处理器架构的挑战。它们缺乏空间和时间局部性以及它们的不规则性使得缓存和传统的流预取器毫无用处。此外，对稀疏访问执行标准缓存和预取会浪费宝贵的内存带宽并破坏缓存，从而降低常规访问的性能。为稀疏访问绕过预取器和缓存，并从主存储器（子行访问）中仅获取单个元素（例如，8 B），可以解决这些问题。决定哪些访问作为稀疏访问处理，哪些作为常规缓存访问处理，是一项具有挑战性的任务，对性能有很大的潜在影响。将稀疏访问视为常规访问不仅会降低性能，不缓存具有局部性的访问也会显着增加延迟和带宽消耗，从而对性能产生负面影响。此外，此决定取决于动态环境，例如输入集特征和系统负载，从而使程序员或编译器的静态决定不理想。我们建议指令空间局部性估计器(小岛)，一种硬件检测器，可以在大量未使用的数据中找到访问孤立字的指令。这些稀疏访问被动态转换为未缓存的子行访问，同时保持常规访问被缓存。ISLE 不需要修改源代码或二进制文件，并自动适应不断变化的环境（输入数据、可用带宽等）。我们将 ISLE 应用于运行稀疏图工作负载的图分析处理器，并表明 ISLE 优于无子线访问、手动子线以及检测稀疏访问的先前工作的性能。

更新日期：2021-05-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>