当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-06-09 , DOI: 10.1016/j.jpdc.2020.05.020
James D. Trotter , Johannes Langguth , Xing Cai

Parallel computations with irregular memory access patterns are often limited by the memory subsystems of multi-core CPUs, though it can be difficult to pinpoint and quantify performance bottlenecks precisely. We present a method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches. Further, we describe a performance model based on these estimates that applies to bandwidth-limited computations. As a case study, we consider two standard algorithms for sparse matrix–vector multiplication, a widely used, irregular kernel. Using three different multi-core CPU systems and a set of matrices that induce a range of irregular memory access patterns, we demonstrate that our cache simulation combined with the proposed performance model accurately quantifies performance bottlenecks that would not be detected using standard best- or worst-case estimates of the data traffic volume.



中文翻译:

多核CPU上不规则内存流量的高速缓存仿真:稀疏矩阵-矢量乘法性能模型的案例研究

尽管很难精确地确定和量化性能瓶颈,但具有不规则内存访问模式的并行计算通常受到多核CPU的内存子系统的限制。我们提出了一种方法,用于估计由多核CPU(具有包含私有和共享缓存的内存层次结构)上的不规则并行计算导致的数据流量。此外,我们基于这些估计值描述了一种性能模型,该模型适用于带宽受限的计算。作为案例研究,我们考虑两种用于稀疏矩阵-矢量乘法的标准算法,一种广泛使用的不规则核。使用三个不同的多核CPU系统和一组矩阵,这些矩阵会引发一系列不规则的内存访问模式,

更新日期:2020-06-09
down
wechat
bug