C-for-Metal: High Performance SIMD Programming on Intel GPUs,arXiv - CS - Programming Languages

当前位置： X-MOL 学术 › arXiv.cs.PL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

C-for-Metal: High Performance SIMD Programming on Intel GPUs
arXiv - CS - Programming Languages Pub Date : 2021-01-26 , DOI: arxiv-2101.11049
Guei-Yuan Lueh, Kaiyu Chen, Gang Chen, Joel Fuentes, Wei-Yu Chen, Fangwen Fu, Hong Jiang, Hongzheng Li, Daniel Rhee

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.

中文翻译：

C换金属：英特尔GPU上的高性能SIMD编程

SIMT执行模型通常用于常规GPU开发。CUDA和OpenCL开发人员编写标量代码，这些标量代码由编译器和硬件隐式并行化。但是，在英特尔GPU上，由于底层ISA是SIMD，并且无法充分利用重要的硬件功能，因此这种抽象具有深远的性能影响。为了弥补这一性能差距，我们引入了C-For-Metal（CM），这是一个明确的SIMD编程框架，旨在在Intel GPU上提供接近金属的性能。CM编程语言及其向量/矩阵类型提供了一个直观的界面来利用底层硬件功能，从而可以进行细粒度的寄存器管理，SIMD大小控制和跨通道数据共享。

更新日期：2021-01-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文