当前位置: X-MOL 学术Computing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A review of CUDA optimization techniques and tools for structured grid computing
Computing ( IF 3.3 ) Pub Date : 2019-07-26 , DOI: 10.1007/s00607-019-00744-1
Mayez A. Al-Mouhamed , Ayaz H. Khan , Nazeeruddin Mohammad

Recent advances in GPUs opened a new opportunity in harnessing their computing power for general purpose computing. CUDA, an extension to C programming, is developed for programming NVIDIA GPUs. However, efficiently programming GPUs using CUDA is very tedious and error prone even for the expert programmers. Programmer has to optimize the resource occupancy and manage the data transfers between host and GPU, and across the memory system. This paper presents the basic architectural optimizations and explore their implementations in research and industry compilers. The focus of the presented review is on accelerating computational science applications such as the class of structured grid computation (SGC). It also discusses the mismatch between current compiler techniques and the requirements for implementing efficient iterative linear solvers. It explores the approaches used by computational scientists to program SGCs. Finally, a set of tools with the main optimization functionalities for an integrated library are proposed to ease the process of defining complex SGC data structure and optimizing solver code using intelligent high-level interface and domain specific annotations.

中文翻译:

用于结构化网格计算的 CUDA 优化技术和工具综述

GPU 的最新进展为利用其计算能力进行通用计算提供了新的机会。CUDA 是 C 编程的扩展,专为 NVIDIA GPU 编程而开发。然而,即使对于专家程序员来说,使用 CUDA 高效地对 GPU 进行编程也非常繁琐且容易出错。程序员必须优化资源占用并管理主机和 GPU 之间以及跨内存系统的数据传输。本文介绍了基本的架构优化,并探讨了它们在研究和行业编译器中的实现。所呈现的评论的重点是加速计算科学应用,例如结构化网格计算 (SGC) 类。它还讨论了当前编译器技术与实现高效迭代线性求解器的要求之间的不匹配。它探索了计算科学家用来编程 SGC 的方法。最后,提出了一组具有集成库主要优化功能的工具,以简化定义复杂 SGC 数据结构和使用智能高级接口和特定领域注释优化求解器代码的过程。
更新日期:2019-07-26
down
wechat
bug