当前位置: X-MOL 学术Inf. Softw. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic block dimensioning on GPU-accelerated programs through particle swarm optimization
Information and Software Technology ( IF 3.9 ) Pub Date : 2020-03-19 , DOI: 10.1016/j.infsof.2020.106299
Claudio M.N.A. Pereira , Andre L.S. Pinheiro , Roberto Schirru

Context

Nowadays, the use of GPU to improve performance of computationally expensive systems are widely explored. On GPU-accelerated programs, performance is related to the partition of the problem into blocks of threads in such a way that the parallel tasks to be executed better fit the GPU architecture. Although there exists some general guidelines to help defining block dimensions, finding the optimum partition is still a complex and problem dependent task. In this work, it has been investigated the use of particle swarm optimization (PSO) to optimize blocks dimensions aiming to minimize programs execution time. The approach was evaluated on a GPU-accelerated wind field calculation program, in which block dimensioning was based on literature guidelines and empirical adjusts. Before PSO optimization, the program was about 25 times faster than the sequential program. After applying PSO, speedup increased to about 60 times. Unexpected optimized configurations were observed, ratifying that finding optimum dimensioning is a complex task. So the use of a robust optimization tool, such as PSO, demonstrated to be very profitable, allowing automatic optimization of blocks dimensions without necessity of a priori knowledge about problem, programs peculiarities and GPU architecture.

Objective

Improve speedup of GPU-accelerated programs by automatic defining optimized block dimensions using PSO.

Method

A GPU-accelerated wind field calculation problem has been focused. A PSO was interfaced to the program in order to find the block dimensions that leads to a minimum execution time. Results were compared to literature results.

Results

The speedup obtained with the proposed approach is more than 2 times the original speedup.

Conclusion

PSO, demonstrated to be very profitable, allowing automatic optimization of blocks dimensions without necessity of a priori knowledge about problem/programs peculiarities and/or GPU architecture.



中文翻译:

通过粒子群优化在GPU加速的程序上自动进行尺寸标注

语境

如今,人们广泛探索使用GPU来改善计算昂贵的系统的性能。在GPU加速的程序上,性能与问题的划分成多个线程块有关,这种方式使要执行的并行任务更好地适合GPU架构。尽管存在一些帮助定义块尺寸的通用准则,但是找到最佳分区仍然是一项复杂且与问题相关的任务。在这项工作中,已经研究了使用粒子群优化(PSO)来优化块尺寸,以最大程度地减少程序执行时间。该方法在GPU加速的风场计算程序上进行了评估,其中程序块尺寸的确定基于文献指南和经验调整。在进行PSO优化之前,该程序比顺序程序快25倍。应用PSO后,加速提高到大约60倍。观察到了意外的优化配置,从而证明找到最佳尺寸是一项复杂的任务。因此,使用强大的优化工具(例如PSO)被证明是非常有利可图的,它允许自动优化块的尺寸,而无需事先了解问题,程序特殊性和GPU架构的知识。

目的

通过使用PSO自动定义优化的块尺寸来提高GPU加速程序的速度。

方法

GPU加速的风场计算问题已得到关注。为了找到导致最小执行时间的块尺寸,将PSO连接到程序。将结果与文献结果进行比较。

结果

通过提出的方法获得的加速比原始加速的2倍还多。

结论

PSO被证明是非常有利可图的,它允许自动优化块的尺寸,而无需先验地了解问题/程序的特性和/或GPU架构。

更新日期:2020-03-19
down
wechat
bug