Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
arXiv - CS - Mathematical Software Pub Date : 2020-07-26 , DOI: arxiv-2007.13055
Zijing Gu

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.

中文翻译：

使用 TVM 在 CUDA 上优化块稀疏矩阵乘法

我们在 CUDA 上实现并优化了密集矩阵和块稀疏矩阵之间的矩阵乘法。我们利用深度学习编译器 TVM 来探索操作的调度空间并生成高效的 CUDA 代码。通过 TVM 中的自动参数调整，与其他最先进的框架相比，我们基于跨线程减少的实现实现了具有竞争力或更好的性能。

更新日期：2020-07-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>