Automatic Generation of Efficient Sparse Tensor Format Conversion Routines,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Generation of Efficient Sparse Tensor Format Conversion Routines
arXiv - CS - Programming Languages Pub Date : 2020-01-08 , DOI: arxiv-2001.02609
Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination. Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01$\times$ that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01$\times$ for CSC/COO to DIA/ELL conversion.

中文翻译：

自动生成高效的稀疏张量格式转换例程

本文展示了如何生成代码，以在不同的存储格式（数据布局）（例如 CSR、DIA、ELL 和许多其他格式）之间有效地转换稀疏张量。我们将稀疏张量转换分解为三个逻辑阶段：坐标重映射、分析和组装。然后我们开发了一种语言，它精确地描述了不同格式如何组合在一起并在内存中对张量的非零值进行排序。这允许编译器发出代码，在格式之间转换时执行非零值的复杂重新映射。我们还开发了一种查询语言，可以提取有关稀疏张量的统计信息，并展示如何发出计算此类查询的高效分析代码。最后，我们定义了一个抽象接口，用于捕获如何在给定有关张量的特定统计数据的情况下有效地组装用于存储张量的数据结构。不同的格式可以实现这个通用接口，从而让编译器为多种格式的任意组合发出优化的稀疏张量转换代码，而无需对任何特定组合进行硬编码。我们的评估表明，该技术生成的稀疏张量转换例程的性能介于 SPARSKIT 和英特尔 MKL（两个流行的稀疏线性代数库）中手动优化版本的 1.00 和 2.01 倍之间。通过发出避免物化临时文件的代码，这两个库都需要源和目标格式的多种组合，我们的技术在 CSC/COO 到 DIA/ELL 转换方面比这些库高 1.78 到 4.01$\times$。从而让编译器为多种格式的任意组合发出优化的稀疏张量转换代码，而无需对任何特定组合进行硬编码。我们的评估表明，该技术生成的稀疏张量转换例程的性能介于 SPARSKIT 和英特尔 MKL（两个流行的稀疏线性代数库）中手动优化版本的 1.00 和 2.01 倍之间。通过发出避免物化临时文件的代码，这两个库都需要源和目标格式的多种组合，我们的技术在 CSC/COO 到 DIA/ELL 转换方面比这些库高 1.78 到 4.01$\times$。从而让编译器为多种格式的任意组合发出优化的稀疏张量转换代码，而无需对任何特定组合进行硬编码。我们的评估表明，该技术生成的稀疏张量转换例程的性能介于 SPARSKIT 和英特尔 MKL（两个流行的稀疏线性代数库）中手动优化版本的 1.00 和 2.01 倍之间。并且通过发出避免物化临时文件的代码，这两个库都需要源和目标格式的多种组合，我们的技术在 CSC/COO 到 DIA/ELL 转换方面比这些库高 1.78 到 4.01$\times$。01$\times$ 在 SPARSKIT 和 Intel MKL 这两个流行的稀疏线性代数库中手动优化的版本。通过发出避免物化临时文件的代码，这两个库都需要源和目标格式的多种组合，我们的技术在 CSC/COO 到 DIA/ELL 转换方面比这些库高 1.78 到 4.01$\times$。01$\times$ 在 SPARSKIT 和 Intel MKL 这两个流行的稀疏线性代数库中手动优化的版本。通过发出避免物化临时文件的代码，这两个库都需要源和目标格式的多种组合，我们的技术在 CSC/COO 到 DIA/ELL 转换方面比这些库高 1.78 到 4.01$\times$。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文