当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-06-30 , DOI: arxiv-2006.16767
Min Li and Yulong Ao and Chao Yang

Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication (SpMSpV), not to mention dealing with input vectors of varied sparsity. The key challenge is that depending on the sparsity levels, distribution of data, and compute platform, the optimal choice of SpMV/SpMSpV kernel can vary, and a static choice does not suffice. In this paper, we propose an adaptive SpMV/SpMSpV framework, which can automatically select the appropriate SpMV/SpMSpV kernel on GPUs for any sparse matrix and vector at the runtime. Based on systematic analysis on key factors such as computing pattern, workload distribution and write-back strategy, eight candidate SpMV/SpMSpV kernels are encapsulated into the framework to achieve high performance in a seamless manner. A comprehensive study on machine learning based kernel selector is performed to choose the kernel and adapt with the varieties of both the input and hardware from both accuracy and overhead perspectives. Experiments demonstrate that the adaptive framework can substantially outperform the previous state-of-the-art in real-world applications on NVIDIA Tesla K40m, P100 and V100 GPUs.

中文翻译:

GPU 上用于可变稀疏度的输入向量的自适应 SpMV/SpMSpV

尽管在现代硬件架构上为优化稀疏矩阵和向量乘法 (SpMV) 的性能付出了很多努力,但对其稀疏对应物稀疏矩阵和稀疏向量乘法 (SpMSpV) 的工作很少,更不用说处理各种稀疏度的输入向量了. 关键挑战在于,根据稀疏程度、数据分布和计算平台,SpMV/SpMSpV 内核的最佳选择可能会有所不同,静态选择是不够的。在本文中,我们提出了一种自适应 SpMV/SpMSpV 框架,它可以在运行时为任何稀疏矩阵和向量在 GPU 上自动选择合适的 SpMV/SpMSpV 内核。基于对计算模式、工作负载分布和回写策略等关键因素的系统分析,八个候选 SpMV/SpMSpV 内核被封装到框架中,以无缝方式实现高性能。对基于机器学习的内核选择器进行了全面研究,以从准确性和开销的角度选择内核并适应输入和硬件的多样性。实验表明,自适应框架在 NVIDIA Tesla K40m、P100 和 V100 GPU 上的实际应用程序中可以大大优于以前的最先进技术。
更新日期:2020-10-13
down
wechat
bug