当前位置: X-MOL 学术Sustain. Comput. Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance and energy consumption of a Gram–Schmidt process for vector orthogonalization on a processor integrated GPU
Sustainable Computing: Informatics and Systems ( IF 3.8 ) Pub Date : 2020-09-30 , DOI: 10.1016/j.suscom.2020.100456
Thomas Jakobs , Lukas Reinhardt , Gudula Rünger

Modern processors are, in addition to general purpose cores, equipped with specialized hardware units, such as processor integrated GPUs (iGPUs), which are used in the investigations of this article. An iGPU is directly connected to the cores and provides several benefits, including low cost and energy efficiency. For the execution of scientific applications on iGPUs, the OpenCL framework is a suitable choice. In this article, we consider the modified Gram–Schmidt process for vector orthogonalization, which computes a QR decomposition, and develop several OpenCL program variants to be executed on an iGPU. The performance and energy consumption of the Gram–Schmidt OpenCL program variants are investigated on two different processor architectures with a Gen 7.5 and a Gen9 iGPU architecture. The program variants result from various modifications, such as the use of local memory, SIMD data types and the avoidance of copy operations. Additionally, we show, how the use of OpenCL SIMD data types and the avoidance of copy operations influences the energy consumption of the cores and the iGPU.



中文翻译:

处理器集成GPU上用于矢量正交化的Gram–Schmidt进程的性能和能耗

除了通用内核之外,现代处理器还配备了专用硬件单元,例如处理器集成GPU(iGPU),这些硬件单元将在本文的研究中使用。iGPU直接连接到内核,并具有多种优势,包括低成本和高能效。对于在iGPU上执行科学应用程序,OpenCL框架是一个合适的选择。在本文中,我们考虑了用于矢量正交化的改进的Gram–Schmidt过程,该过程可计算QR分解,并开发出几种可在iGPU上执行的OpenCL程序变体。在具有Gen 7.5和Gen9 iGPU架构的两种不同处理器架构上研究了Gram–Schmidt OpenCL程序变体的性能和能耗。程序变体来自各种修改,例如使用本地内存,SIMD数据类型以及避免复制操作。此外,我们还展示了OpenCL SIMD数据类型的使用以及避免复制操作如何影响内核和iGPU的能耗。

更新日期:2020-09-30
down
wechat
bug