当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2020-07-07 , DOI: 10.1145/3394116
Karel Adámek 1 , Sofia Dimoudi 2 , Mike Giles 3 , Wesley Armour 1
Affiliation  

We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared-memory-based FFT, we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.

中文翻译:

GPU 快速卷积通过共享内存中的 Overlap-and-Save 方法

我们提出了一种重叠保存方法的实现,这是一种用于将非常长的信号与短响应函数进行卷积的方法,该方法专为 GPU 量身定制。我们已经实现了几种 FFT 算法(使用 CUDA 编程语言),它们利用 GPU 共享内存,允许 GPU 加速卷积。我们将我们的实现与使用 NVIDIA FFT 库 (cuFFT) 的重叠保存算法的实现进行比较。我们证明,通过使用基于共享内存的 FFT,我们可以显着提高某些问题规模的速度,并降低 GPU 上重叠保存方法的内存需求。
更新日期:2020-07-07
down
wechat
bug