当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ALBUS: A method for efficiently processing SpMV using SIMD and Load balancing
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2020-11-04 , DOI: 10.1016/j.future.2020.10.036
Haodong Bian , Jianqiang Huang , Lingbin Liu , Dongqiang Huang , Xiaoying Wang

SpMV (Sparse matrix–vector multiplication) is widely used in many fields. Improving the performance of SpMV has been the pursuit of many researchers. Parallel SpMV using multi-core processors has been a standard parallel method used by researchers. In reality, the number of non-zero elements in many sparse matrices is not evenly distributed, so parallelism without preprocessing will cause a large amount of performance loss due to uneven load. In this paper, we propose ALBUS (Absolute Load Balancing Using SIMD (Single Instruction Multiple Data)), a method for efficiently processing SpMV using load balancing and SIMD vectorization. On the one hand, ALBUS can achieve multi-core balanced load processing; on the other hand, it gives full play to the ability of SIMD vectorization parallelism under the CPU. We selected 20 sets of regular matrices and 20 sets of irregular matrices to form the Benchmark suite. We performed SpMV performance comparison tests on ALBUS, CSR5 (Compressed Sparse Row 5), Merge(Merge-based SpMV), and MKL (Math Kernel Library) under the same conditions. On the E5-2670 v3 CPU platform, For 20 sets of regular matrices, ALBUS can achieve an average speedup of 1.59x, 1.32x, 1.48x (up to 2.53x, 2.22x, 2.31x) compared to CSR5, Merge, MKL, respectively. For 20 sets of irregular matrices, ALBUS can achieve an average speedup of 1.38x, 1.42x, 2.44x (up to 2.33x, 2.24x, 5.37x) compared to CSR5, Merge, MKL, respectively.



中文翻译:

ALBUS:一种使用SIMD和负载平衡有效处理SpMV的方法

SpMV(稀疏矩阵-向量乘法)已在许多领域中广泛使用。改善SpMV的性能一直是许多研究人员追求的目标。使用多核处理器的并行SpMV已成为研究人员使用的标准并行方法。实际上,许多稀疏矩阵中非零元素的数量不是均匀分布的,因此如果不进行预处理,并行性就会由于负载不均匀而导致大量的性能损失。在本文中,我们提出了ALBUS(使用SIMD(单指令多数据)的绝对负载平衡),一种使用负载平衡和SIMD矢量化有效处理SpMV的方法。一方面,ALBUS可以实现多核均衡负载处理;另一方面,它充分发挥了CPU下SIMD矢量化并行性的能力。我们选择了20套常规矩阵和20套非常规矩阵构成Benchmark套件。我们在相同条件下对ALBUS,CSR5(压缩的稀疏行5),Merge(基于合并的SpMV)和MKL(数学内核库)进行了SpMV性能比较测试。在E5-2670 v3 CPU平台上,与CSR5,Merge和MKL相比,对于20组常规矩阵,ALBUS可以实现1.59x,1.32x,1.48x(最高2.53x,2.22x,2.31x)的平均加速。 , 分别。对于20组不规则矩阵,相比于CSR5,Merge和MKL,ALBUS的平均加速分别为1.38x,1.42x,2.44x(最高2.33x,2.24x,5.37x)。和MKL(数学内核库)在相同条件下。在E5-2670 v3 CPU平台上,与CSR5,Merge和MKL相比,对于20组常规矩阵,ALBUS可以实现1.59x,1.32x,1.48x(最高2.53x,2.22x,2.31x)的平均加速。 , 分别。对于20组不规则矩阵,相比于CSR5,Merge和MKL,ALBUS的平均加速分别为1.38x,1.42x,2.44x(最高2.33x,2.24x,5.37x)。和MKL(数学内核库)在相同条件下。在E5-2670 v3 CPU平台上,与CSR5,Merge和MKL相比,对于20组常规矩阵,ALBUS可以实现1.59x,1.32x,1.48x(最高2.53x,2.22x,2.31x)的平均加速。 , 分别。对于20组不规则矩阵,相比于CSR5,Merge和MKL,ALBUS的平均加速分别为1.38x,1.42x,2.44x(最高2.33x,2.24x,5.37x)。

更新日期:2020-11-18
down
wechat
bug