当前位置:
X-MOL 学术
›
arXiv.cs.PL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
C-for-Metal: High Performance SIMD Programming on Intel GPUs
arXiv - CS - Programming Languages Pub Date : 2021-01-26 , DOI: arxiv-2101.11049 Guei-Yuan Lueh, Kaiyu Chen, Gang Chen, Joel Fuentes, Wei-Yu Chen, Fangwen Fu, Hong Jiang, Hongzheng Li, Daniel Rhee
arXiv - CS - Programming Languages Pub Date : 2021-01-26 , DOI: arxiv-2101.11049 Guei-Yuan Lueh, Kaiyu Chen, Gang Chen, Joel Fuentes, Wei-Yu Chen, Fangwen Fu, Hong Jiang, Hongzheng Li, Daniel Rhee
The SIMT execution model is commonly used for general GPU development. CUDA
and OpenCL developers write scalar code that is implicitly parallelized by
compiler and hardware. On Intel GPUs, however, this abstraction has profound
performance implications as the underlying ISA is SIMD and important hardware
capabilities cannot be fully utilized. To close this performance gap we
introduce C-For-Metal (CM), an explicit SIMD programming framework designed to
deliver close-to-the-metal performance on Intel GPUs. The CM programming
language and its vector/matrix types provide an intuitive interface to exploit
the underlying hardware features, allowing fine-grained register management,
SIMD size control and cross-lane data sharing. Experimental results show that
CM applications from different domains outperform the best-known SIMT-based
OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.
中文翻译:
C换金属:英特尔GPU上的高性能SIMD编程
SIMT执行模型通常用于常规GPU开发。CUDA和OpenCL开发人员编写标量代码,这些标量代码由编译器和硬件隐式并行化。但是,在英特尔GPU上,由于底层ISA是SIMD,并且无法充分利用重要的硬件功能,因此这种抽象具有深远的性能影响。为了弥补这一性能差距,我们引入了C-For-Metal(CM),这是一个明确的SIMD编程框架,旨在在Intel GPU上提供接近金属的性能。CM编程语言及其向量/矩阵类型提供了一个直观的界面来利用底层硬件功能,从而可以进行细粒度的寄存器管理,SIMD大小控制和跨通道数据共享。
更新日期:2021-01-28
中文翻译:
C换金属:英特尔GPU上的高性能SIMD编程
SIMT执行模型通常用于常规GPU开发。CUDA和OpenCL开发人员编写标量代码,这些标量代码由编译器和硬件隐式并行化。但是,在英特尔GPU上,由于底层ISA是SIMD,并且无法充分利用重要的硬件功能,因此这种抽象具有深远的性能影响。为了弥补这一性能差距,我们引入了C-For-Metal(CM),这是一个明确的SIMD编程框架,旨在在Intel GPU上提供接近金属的性能。CM编程语言及其向量/矩阵类型提供了一个直观的界面来利用底层硬件功能,从而可以进行细粒度的寄存器管理,SIMD大小控制和跨通道数据共享。