当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Manycore Performance-Portability: Kokkos Multidimensional Array Library
Scientific Programming Pub Date : 2012 , DOI: 10.3233/spr-2012-0343
H. Carter Edwards, Daniel Sunderland, Vicki Porter, Chris Amsler, Sam Mish

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].

中文翻译:

Manycore性能-可移植性:Kokkos多维阵列库

大型,复杂的科学和工程应用程序代码对用于实现其数学模型的计算内核进行了大量投资。将这些计算内核移植到现代多核加速器设备的集合中是一个重大挑战,因为这些设备具有多种编程模型,应用程序编程接口(API)和性能要求。Kokkos阵列编程模型提供了一种基于库的方法来实现计算内核,这些内核性能可移植到CPU多核和GPGPU加速器设备。该编程模型基于三个基本概念:(1)多核计算设备,每个设备都有自己的内存空间;(2)数据并行内核;(3)多维数组。内核执行性能特别是对于NVIDIA®设备而言,非常依赖于数据访问模式。最佳数据访问模式对于不同的多核设备可能有所不同–可能导致专门针对不同设备的计算内核的不同实现。Kokkos阵列编程模型通过(1)通过多维阵列API将数据访问模式与计算内核分开,以及(2)在编译内核时引入特定于设备的数据访问映射来支持性能可移植的内核。可通过Trilinos [Trilinos网站,http://trilinos.sandia.gov/,2011年8月]获得Kokkos Array的实现。Kokkos阵列编程模型通过(1)通过多维阵列API将数据访问模式与计算内核分开,以及(2)在编译内核时引入特定于设备的数据访问映射来支持性能可移植的内核。可通过Trilinos [Trilinos网站,http://trilinos.sandia.gov/,2011年8月]获得Kokkos Array的实现。Kokkos阵列编程模型通过(1)通过多维阵列API将数据访问模式与计算内核分开,以及(2)在编译内核时引入特定于设备的数据访问映射来支持性能可移植的内核。可通过Trilinos [Trilinos网站,http://trilinos.sandia.gov/,2011年8月]获得Kokkos Array的实现。
更新日期:2020-09-25
down
wechat
bug