当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
arXiv - CS - Mathematical Software Pub Date : 2019-10-04 , DOI: arxiv-1910.01972
Karel Ad\'amek, Sofia Dimoudi, Mike Giles, Wesley Armour

We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared memory based FFT we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.

中文翻译:

共享内存中通过重叠和保存方法的 GPU 快速卷积

我们提出了重叠和保存方法的实现,这是一种用于卷积具有短响应函数的非常长的信号的方法,该方法适用于 GPU。我们已经实现了几种 FFT 算法(使用 CUDA 编程语言),它们利用 GPU 共享内存,允许 GPU 加速卷积。我们将我们的实现与使用 NVIDIA FFT 库 (cuFFT) 的重叠保存算法的实现进行了比较。我们证明,通过使用基于共享内存的 FFT,我们可以显着提高某些问题规模的速度,并降低 GPU 上重叠和保存方法的内存要求。
更新日期:2020-04-13
down
wechat
bug