当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR
arXiv - CS - Programming Languages Pub Date : 2021-02-09 , DOI: arxiv-2102.05187
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor

Tensor algebra is widely used in many applications, such as scientific computing, machine learning, and data analytics. The tensors represented real-world data are usually large and sparse. There are tens of storage formats designed for sparse matrices and/or tensors and the performance of sparse tensor operations depends on a particular architecture and/or selected sparse format, which makes it challenging to implement and optimize every tensor operation of interest and transfer the code from one architecture to another. We propose a tensor algebra domain-specific language (DSL) and compiler infrastructure to automatically generate kernels for mixed sparse-dense tensor algebra operations, named COMET. The proposed DSL provides high-level programming abstractions that resemble the familiar Einstein notation to represent tensor algebra operations. The compiler performs code optimizations and transformations for efficient code generation while covering a wide range of tensor storage formats. COMET compiler also leverages data reordering to improve spatial or temporal locality for better performance. Our results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement, for parallel SpMV, SpMM, and TTM over TACO, respectively.

中文翻译:

多级红外中的高性能稀疏张量代数编译器

Tensor代数广泛用于许多应用程序,例如科学计算,机器学习和数据分析。代表实际数据的张量通常很大且稀疏。有数十种用于稀疏矩阵和/或张量的存储格式,并且稀疏张量操作的性能取决于特定的体系结构和/或所选的稀疏格式,这使得实现和优化所需的每个张量操作并传输代码都具有挑战性从一种架构到另一种架构。我们提出了一种张量代数领域专用语言(DSL)和编译器基础结构,以自动生成用于混合稀疏-密集张量代数运算的内核,称为COMET。拟议的DSL提供了类似于熟悉的Einstein符号来表示张量代数运算的高级编程抽象。编译器执行代码优化和转换,以高效生成代码,同时覆盖各种张量存储格式。COMET编译器还利用数据重新排序来改善空间或时间局部性,以获得更好的性能。我们的结果表明,与通过TACO进行并行SpMV,SpMM和TTM相比,自动生成的内核的性能优于最新的稀疏张量代数编译器,性能分别提高了20.92x,6.39x和13.9x。分别。
更新日期:2021-02-11
down
wechat
bug