Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-07-22 , DOI: 10.1002/cpe.6515
José I. Aliaga ₁ , Hartwig Anzt _{2,

3} , Thomas Grützmacher ₂ , Enrique S. Quintana‐Ortí ₄ , Andrés E. Tomás _{1,

5}

Affiliation

We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the numerical information. Our approach is multi-platform, in the sense that the realizations for (general-purpose) multicore processors as well as graphics accelerators (GPUs) are built upon common principles, but differ in the implementation details, which are adapted to avoid thread divergence in the GPU case or maximize compression element-wise (i.e., for each matrix entry) for multicore architectures. Our evaluation on the two last generations of NVIDIA GPUs as well as Intel and AMD processors demonstrate the benefits of the new kernels when compared with the optimized implementations of the sparse matrix-vector product in NVIDIA's cuSPARSE and Intel's MKL, respectively.

中文翻译：

多核处理器和图形处理单元上高效稀疏矩阵向量乘积的压缩和负载平衡

我们通过引入坐标稀疏矩阵格式的变体来平衡工作负载分布并压缩索引数组和数字信息，从而有助于优化稀疏矩阵向量乘积。我们的方法是多平台的，因为（通用）多核处理器和图形加速器 (GPU) 的实现基于共同的原则，但在实现细节上有所不同，这些实现细节是为了避免线程分歧GPU 案例或多核架构的按元素最大化压缩（即，对于每个矩阵条目）。

更新日期：2021-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>