当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
KernelFaRer
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2021-06-28 , DOI: 10.1145/3459010
João P. L. De Carvalho 1 , Braedy Kuzma 2 , Ivan Korostelev 2 , José Nelson Amaral 2 , Christopher Barton 3 , José Moreira 4 , Guido Araujo 1
Affiliation  

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a code pattern for which a well-tuned library implementation exists is found in the source code of an application, the highest performing solution is to replace the pattern with a call to the library. Idiom-recognition solutions in the past either required pattern matching machinery that was outside of the compilation framework or provided a very brittle solution that would fail even for minor variants in the pattern source code. This article introduces Kernel Find & Replacer ( KernelFaRer ), an idiom recognizer implemented entirely in the existing LLVM compiler framework. The versatility of KernelFaRer is demonstrated by matching and replacing two linear algebra idioms, general matrix-matrix multiplication (GEMM), and symmetric rank-2k update (SYR2K). Both GEMM and SYR2K are used extensively in scientific computation, and GEMM is also a central building block for deep learning and computer graphics algorithms. The idiom recognition in KernelFaRer is much more robust than alternative solutions, has a much lower compilation overhead, and is fully integrated in the broadly used LLVM compilation tools. KernelFaRer replaces existing GEMM and SYR2K idioms with computations performed by BLAS, Eigen, MKL (Intel’s x86), ESSL (IBM’s PowerPC), and BLIS (AMD). Gains in performance that reach 2000× over hand-crafted source code compiled at the highest optimization level demonstrate that replacing application code with library call is a performant solution.

中文翻译:

内核优先级

精心设计的库比复杂的应用程序程序员使用高级优化编译器生成的代码提供更高的性能。当在应用程序的源代码中找到存在经过良好调整的库实现的代码模式时,性能最高的解决方案是用对库的调用来替换该模式。过去的成语识别解决方案要么需要编译框架之外的模式匹配机制,要么提供了一个非常脆弱的解决方案,即使模式源代码中的微小变体也会失败。本文介绍 Kernel Find & Replacer (内核优先级),一个完全在现有 LLVM 编译器框架中实现的成语识别器。的多功能性内核优先级通过匹配和替换两个线性代数习语、通用矩阵-矩阵乘法 (GEMM) 和对称 rank-2k 更新 (SYR2K) 来证明。GEMM 和 SYR2K 都广泛用于科学计算,GEMM 也是深度学习和计算机图形算法的核心构建块。成语识别在内核优先级比其他解决方案更健壮,编译开销更低,并且完全集成在广泛使用的 LLVM 编译工具中。内核优先级用 BLAS、Eigen、MKL(英特尔的 x86)、ESSL(IBM 的 PowerPC)和 BLIS(AMD)执行的计算替换现有的 GEMM 和 SYR2K 习语。与在最高优化级别编译的手工制作的源代码相比,性能增益达到 2000 倍,这表明用库调用替换应用程序代码是一种高性能的解决方案。
更新日期:2021-06-28
down
wechat
bug