当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
arXiv - CS - Mathematical Software Pub Date : 2020-07-26 , DOI: arxiv-2007.13055
Zijing Gu

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.

中文翻译:

使用 TVM 在 CUDA 上优化块稀疏矩阵乘法

我们在 CUDA 上实现并优化了密集矩阵和块稀疏矩阵之间的矩阵乘法。我们利用深度学习编译器 TVM 来探索操作的调度空间并生成高效的 CUDA 代码。通过 TVM 中的自动参数调整,与其他最先进的框架相比,我们基于跨线程减少的实现实现了具有竞争力或更好的性能。
更新日期:2020-07-28
down
wechat
bug