当前位置: X-MOL 学术Cluster Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector
Cluster Computing ( IF 4.4 ) Pub Date : 2020-10-03 , DOI: 10.1007/s10586-020-03188-x
Song Jin , Songwei Pei , Yu Wang , Yincheng Qi

Sparse triangular solve (SpTRSV) is an important kernel in many scientific computing applications. In traditional viewpoints, accelerating SpTRSV by parallelizing the solution process is a challenging task. Dependencies among the variables that exist in the solution process not only restrict the parallelism that can be achieved, but also introduce large synchronization overhead among the parallel tasks. Moreover, a time-consuming pre-processing phase is commonly required to identify calculations that can be parallelized. However, we have observed that a large number of dependencies among the variables can be eliminated if we only calculate partial values of the variables first and add them together to obtain the final values later. By using such a strategy, starting to solve a variable does not need to wait for all of its prerequisite variables having been solved. In consequence, parallelism of the SpTRSV can be increased significantly. In this paper, we transform above mentioned observations into a subtree-based parallel algorithm to accelerate SpTRSV. The proposed algorithm calculates partial values of the variable along with an implicit subtree traversal and utilizes hardware atomic operation to implement accumulation of the partial values. This not only introduces no pre-processing overhead, but also avoids any barrier synchronization among the parallel threads. We evaluate the proposed algorithm on 2135 matrices from SuiteSparse Matrix Collection based on a generic GPU platform. Experimental results demonstrate that our scheme outperforms the state-of-the-art GPU and CPU vendor libraries in 1949 and 1782 matrices, respectively. Compared with the latest synchronization-free method, our scheme outperforms in 1779 matrices.



中文翻译:

基于解矢量依赖消除的并行稀疏三角求解算法

稀疏三角求解(SpTRSV)是许多科学计算应用程序中的重要内核。在传统观点中,通过并行化解决方案过程来加速SpTRSV是一项艰巨的任务。解决方案过程中存在的变量之间的依赖性不仅限制了可以实现的并行性,而且还导致并行任务之间的同步开销很大。此外,通常需要耗时的预处理阶段来识别可以并行化的计算。但是,我们已经观察到,如果仅先计算变量的部分值,然后将它们相加以获得最终值,则可以消除变量之间的大量依赖关系。通过使用这样的策略,开始求解变量不需要等待其所有前提变量都已求解。因此,可以显着提高SpTRSV的并行性。在本文中,我们将上述观察结果转换为基于子树的并行算法以加速SpTRSV。所提出的算法计算变量的部分值以及隐式子树遍历,并利用硬件原子操作来实现部分值的累加。这不仅不带来预处理开销,而且避免了并行线程之间的任何屏障同步。我们基于通用GPU平台,在SuiteSparse Matrix Collection的2135个矩阵上评估了提出的算法。实验结果表明,我们的方案在1949年和1782年矩阵中分别优于最新的GPU和CPU供应商库。与最新的免同步方法相比,我们的方案在1779个矩阵上的性能要好。

更新日期:2020-10-04
down
wechat
bug