当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A radix sorting parallel algorithm suitable for graphic processing unit computing
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2020-09-30 , DOI: 10.1002/cpe.5818
Shi‐yang Xiao 1 , Cai‐lin Li 2 , Bao‐yun Guo 2 , Han Xiao 3
Affiliation  

Radix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel real‐time data processing capability. Taking advantage of the parallelism of GPU in numerical computation processing, a parallelization design method of the Binary_Least Significant Digit (LSD) first Radix Sorting (B_LSD_RS) algorithm based on Open Computing Language (OpenCL) is proposed. The radix sorting algorithm is divided into multiple kernel tasks, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is implemented and verified on the GPU + CPU heterogeneous platform. The experimental results show that compared with the performance of the B_LSD_RS sequential algorithm based on AMD Ryzen5 1600X CPU, B_LSD_RS parallel algorithm based on Open Multi‐Processing (OpenMP) and B_LSD_RS parallel algorithm based on Compute Unified Device Architecture (CUDA), the B_LSD_RS parallel algorithm based on OpenCL obtained 28.86 times, 11.01 times and 2.14 times speedup in the NVIDIA GTX 1070 computing platform respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.

中文翻译:

适用于图形处理单元计算的基数排序并行算法

基数排序是许多计算机领域中必不可少的基本数据处理操作。通过图形处理单元(GPU)加速其性能具有重要的现实意义。异构并行计算技术备受关注,并因其有效的计算效率和并行实时数据处理能力而得到广泛应用。利用GPU在数值计算处理中的并行性,提出了一种基于开放计算语言(OpenCL)的二进制最低有效位优先基数排序(B_LSD_RS)算法的并行化设计方法。基数排序算法分为多个内核任务,并且通过事件信息传递顺序控制内核。并行算法是在GPU + CPU异构平台上实现和验证的。实验结果表明,与基于AMD Ryzen5 1600X CPU的B_LSD_RS顺序算法,基于开放式多处理(OpenMP)的B_LSD_RS并行算法和基于计算统一设备架构(CUDA)的B_LSD_RS并行算法的性能相比,B_LSD_RS并行性能基于OpenCL的算法在NVIDIA GTX 1070计算平台上分别获得了28.86倍,11.01倍和2.14倍的加速,不仅实现了高性能,而且还实现了在不同GPU计算平台之间的性能可移植性。
更新日期:2020-09-30
down
wechat
bug