GPU Programming Productivity in Different Abstraction Paradigms,ACM Transactions on Computing Education

当前位置： X-MOL 学术 › ACM Trans. Comput. Educ. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GPU Programming Productivity in Different Abstraction Paradigms
ACM Transactions on Computing Education ( IF 3.2 ) Pub Date : 2020-10-14 , DOI: 10.1145/3418301
Patrick Daleiden ₁ , Andreas Stefik ₁ , Philip Merlin Uesbeck ₁

Affiliation

Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.

中文翻译：

不同抽象范式中的 GPU 编程效率

高性能计算中的协处理器架构在当今的科学计算集群中很普遍，并且需要专业知识才能正确使用。存在用于并行和卸载计算的各种替代范式，但对使用不同范式的人为因素影响知之甚少。来自拉斯维加斯内华达大学的计算机科学专业学生参与者之前没有接触过图形处理单元编程，我们的研究将 NVIDIA CUDA C/C++ 作为对照组和 Thrust 库进行了比较。Thrust 的设计者声称他们更高级别的抽象提高了程序员的生产力。该试验对 91 名参与者进行，并通过我们的计算机化测试平台进行管理。尽管该研究只关注卸载计算问题的基本步骤，并且不打算对一种方法或另一种方法的优越性进行全面评估，但我们发现证据表明，尽管 Thrust 是为易于使用而设计的，但抽象倾向于使学生感到困惑，并在某些情况下降低了生产力。具体来说，Thrust 中的抽象用于 (i) 通过 C++ 标准模板库风格的向量库调用进行内存分配，(ii) 通过重载赋值运算符在主机和图形处理单元协处理器之间进行内存传输，以及 (iii) 执行卸载的通过通用转换库调用而不是 CUDA 内核例程执行的例程都等于或差于 CUDA。

更新日期：2020-10-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11