A Split Execution Model for SpTRSV,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Split Execution Model for SpTRSV
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-04-21 , DOI: 10.1109/tpds.2021.3074501
Najeeb Ahmad , Buse Yilmaz , Didem Unat

Sparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. Parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computational step to the next. In cases where the SpTRSV computational steps have contrasting parallelism characteristics- some steps are more parallel, others more sequential in nature, the performance of an SpTRSV algorithm may be limited by the contrasting parallelism characteristics. In this work, we propose a split-execution model for SpTRSV to automatically divide SpTRSV computation into two sub-SpTRSV systems and an SpMV, such that one of the sub-SpTRSVs has more parallelism than the other. Each sub-SpTRSV is then computed using different SpTRSV algorithms, which are possibly executed on different platforms (CPU or GPU). By analyzing the SpTRSV Directed Acyclic Graph (DAG) and matrix sparsity features, we use a heuristics-based approach to (i) automatically determine the suitability of an SpTRSV for split-execution, (ii) find the appropriate split-point, and (iii) execute SpTRSV in a split fashion using two SpTRSV algorithms while managing any required inter-platform communication. Experimental evaluation of the execution model on two CPU-GPU machines with a matrix dataset of 327 matrices from the SuiteSparse Matrix Collection shows that our approach correctly selects the fastest SpTRSV method (split or unsplit) for 88 percent of matrices on the Intel Xeon Gold (6148) + NVIDIA Tesla V100 and 83 percent on the Intel Core I7 + NVIDIA G1080 Ti platform achieving speedups up to 10x and 6.36x respectively.

中文翻译：

SpTRSV的分割执行模型

稀疏三角求解（SpTRSV）是科学计算中重要且广泛使用的内核。 SpTRSV 内的并行性取决于矩阵稀疏模式，并且在许多情况下，从一个计算步骤到下一个计算步骤是不均匀的。在 SpTRSV 计算步骤具有对比并行特性的情况下（本质上有些步骤更并行，其他步骤更顺序），SpTRSV 算法的性能可能会受到对比并行特性的限制。在这项工作中，我们提出了一种 SpTRSV 的分割执行模型，自动将 SpTRSV 计算划分为两个子 SpTRSV 系统和一个 SpMV，使得其中一个子 SpTRSV 比另一个具有更多的并行性。然后使用不同的 SpTRSV 算法计算每个子 SpTRSV，这些算法可能在不同的平台（CPU 或 GPU）上执行。通过分析 SpTRSV 有向无环图 (DAG) 和矩阵稀疏特征，我们使用基于启发式的方法来 (i) 自动确定 SpTRSV 是否适合拆分执行，(ii) 找到合适的拆分点，以及 ( iii) 使用两个 SpTRSV 算法以拆分方式执行 SpTRSV，同时管理任何所需的平台间通信。使用来自 SuiteSparse 矩阵集合的 327 个矩阵的矩阵数据集对两台 CPU-GPU 机器上的执行模型进行实验评估表明，我们的方法为 Intel Xeon Gold 上 88% 的矩阵正确选择了最快的 SpTRSV 方法（分割或未分割）（ 6148) + NVIDIA Tesla V100 和英特尔酷睿 I7 + NVIDIA G1080 Ti 平台上分别实现高达 10 倍和 6.36 倍加速的 83%。

更新日期：2021-04-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11