当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic Generation of High-Performance FFT Kernels on Arm and x86 CPUs
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-08-01 , DOI: 10.1109/tpds.2020.2977629
Zhihao Li , Haipeng Jia , Yunquan Zhang , Tun Chen , Liang Yuan , Richard Vuduc

This article presents AutoFFT, a template-based code generation framework that can automatically generate high-performance FFT kernels for all natural-number radices. AutoFFT is based on the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix, as the outer parallelization framework. Because butterflies are the core operations of the Cooley-Tukey algorithm, we explore additional symmetric and periodic properties of the DFT matrix and formulate multiple optimized calculation templates to further reduce the number of floating-point operations for butterflies of arbitrary natural numbers. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these calculation templates and converts them into efficient assembly kernels using the template optimizer. Through a series of experiments on Arm, Intel, and AMD processors, we show that AutoFFT-generated kernels can outperform those in Fastest Fourier Transform in the West (FFTW), the Arm Performance Libraries (ARMPL), and the Intel Math Kernel Library (MKL).

中文翻译:

在 Arm 和 x86 CPU 上自动生成高性能 FFT 内核

本文介绍了 AutoFFT,这是一种基于模板的代码生成框架,可以为所有自然数基自动生成高性能 FFT 内核。AutoFFT 基于 Cooley-Tukey FFT 算法,该算法利用 DFT 矩阵的对称和周期性特性作为外部并行化框架。由于蝴蝶是 Cooley-Tukey 算法的核心运算,我们探索了 DFT 矩阵的额外对称和周期性特性,并制定了多个优化的计算模板,以进一步减少任意自然数蝴蝶的浮点运算次数。为了充分利用硬件资源,我们在程序集模板优化器中封装了一系列优化。给定任何 DFT 问题,AutoFFT 使用这些计算模板自动生成 C FFT 内核,并使用模板优化器将它们转换为高效的汇编内核。通过在 Arm、Intel 和 AMD 处理器上进行的一系列实验,我们表明 AutoFFT 生成的内核可以胜过西方最快的傅立叶变换 (FFTW)、Arm 性能库 (ARMPL) 和 Intel Math Kernel Library ( MKL)。
更新日期:2020-08-01
down
wechat
bug