当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bandwidth-Optimal Random Shuffling for GPUs
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-06-11 , DOI: arxiv-2106.06161
Rory Mitchell, Daniel Stokes, Eibe Frank, Geoffrey Holmes

Linear-time algorithms that are traditionally used to shuffle data on CPUs, such as the method of Fisher-Yates, are not well suited to implementation on GPUs due to inherent sequential dependencies. Moreover, existing parallel shuffling algorithms show unsatisfactory performance on GPU architectures because they incur a large number of read/write operations to high latency global memory. To address this, we provide a method of generating pseudo-random permutations in parallel by fusing suitable pseudo-random bijective functions with stream compaction operations. Our algorithm, termed `bijective shuffle' trades increased per-thread arithmetic operations for reduced global memory transactions. It is work-efficient, deterministic, and only requires a single global memory read and write per shuffle input, thus maximising use of global memory bandwidth. To empirically demonstrate the correctness of the algorithm, we develop a consistent, linear time, statistical test for the quality of pseudo-random permutations based on kernel space embeddings. Empirical results show that the bijective shuffle algorithm outperforms competing algorithms on multicore CPUs and GPUs, showing improvements of between one and two orders of magnitude and approaching peak device bandwidth.

中文翻译:

GPU 的带宽优化随机洗牌

由于固有的顺序依赖性,传统上用于在 CPU 上打乱数据的线性时间算法(例如 Fisher-Yates 方法)不太适合在 GPU 上实现。此外,现有的并行改组算法在 GPU 架构上的性能并不令人满意,因为它们会导致对高延迟全局内存的大量读/写操作。为了解决这个问题,我们提供了一种通过将合适的伪随机双射函数与流压缩操作融合来并行生成伪随机排列的方法。我们的算法,称为“双射洗牌”,交易增加的每线程算术运算以减少全局内存事务。它具有工作效率、确定性,并且每个 shuffle 输入只需要一个全局内存读写,从而最大限度地利用全局内存带宽。为了凭经验证明算法的正确性,我们基于内核空间嵌入对伪随机排列的质量进行了一致的线性时间统计测试。实证结果表明,双射 shuffle 算法在多核 CPU 和 GPU 上的性能优于竞争算法,显示出一到两个数量级的改进,并接近峰值设备带宽。
更新日期:2021-06-14
down
wechat
bug