当前位置: X-MOL 学术Computing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance
Computing ( IF 3.3 ) Pub Date : 2021-06-03 , DOI: 10.1007/s00607-021-00958-2
Yasir Noman Khalid , Muhammad Aleem , Usman Ahmed , Radu Prodan , Muhammad Arshad Islam , Muhammad Azhar Iqbal

Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in greatly reducing the execution time of data-parallel applications by taking advantage of the massive available parallelism. However, when a small data size application is executed on GPU there is a wastage of GPU resources as the application cannot fully utilize GPU compute-cores. There is no mechanism to share a GPU between two kernels due to the lack of operating system support on GPU. In this paper, we propose the provision of a GPU sharing mechanism between two kernels that will lead to increasing GPU occupancy, and as a result, reduce execution time of a job pool. However, if a pair of the kernel is competing for the same set of resources (i.e., both applications are compute-intensive or memory-intensive), kernel fusion may also result in a significant increase in execution time of fused kernels. Therefore, it is pertinent to select an optimal pair of kernels for fusion that will result in significant speedup over their serial execution. This research presents FusionCL, a machine learning-based GPU sharing mechanism between a pair of OpenCL kernels. FusionCL identifies each pair of kernels (from the job pool), which are suitable candidates for fusion using a machine learning-based fusion suitability classifier. Thereafter, from all the candidates, it selects a pair of candidate kernels that will produce maximum speedup after fusion over their serial execution using a fusion speedup predictor. The experimental evaluation shows that the proposed kernel fusion mechanism reduces execution time by 2.83× when compared to a baseline scheduling scheme. When compared to state-of-the-art, the reduction in execution time is up to 8%.



中文翻译:

FusionCL:一种基于机器学习的 OpenCL 内核融合方法,可提高系统性能

在 OpenCL 的帮助下使用通用图形处理单元 (GPGPU),通过利用大量可用的并行性,大大减少了数据并行应用程序的执行时间。但是,当在 GPU 上执行小数据大小的应用程序时,会浪费 GPU 资源,因为应用程序无法充分利用 GPU 计算核心。由于 GPU 缺乏操作系统支持,因此没有在两个内核之间共享 GPU 的机制。在本文中,我们建议在两个内核之间提供 GPU 共享机制,这将导致 GPU 占用率增加,从而减少作业池的执行时间。但是,如果一对内核正在竞争同一组资源(即两个应用程序都是计算密集型或内存密集型),内核融合还可能导致融合内核的执行时间显着增加。因此,选择一对最佳内核进行融合是相关的,这将导致其串行执行的显着加速。这项研究提出FusionCL,一种在一对 OpenCL 内核之间基于机器学习的 GPU 共享机制。FusionCL识别每对内核(来自作业池),它们是使用基于机器学习的融合适合性分类器进行融合的合适候选者。此后,它从所有候选内核中选择一对候选内核,这些内核将使用融合加速预测器在其串行执行的融合后产生最大加速。实验评估表明,与基线调度方案相比,所提出的内核融合机制将执行时间减少了 2.83 倍。与最先进的技术相比,执行时间减少了 8%。

更新日期:2021-06-03
down
wechat
bug