GPU Programming Productivity in Different Abstraction Paradigms: A Randomized Controlled Trial Comparing CUDA and Thrust

Authors:
Patrick Daleiden

University of Nevada, Maryland Pkwy, Las Vegas, NV

University of Nevada, Maryland Pkwy, Las Vegas, NV

0000-0002-8601-0650
View Profile

,
Andreas Stefik

University of Nevada, Maryland Pkwy, Las Vegas, NV

University of Nevada, Maryland Pkwy, Las Vegas, NV
View Profile

,
Philip Merlin Uesbeck

University of Nevada, Maryland Pkwy, Las Vegas, NV

University of Nevada, Maryland Pkwy, Las Vegas, NV

0000-0001-8182-9942
View Profile

Authors Info & Claims

ACM Transactions on Computing Education Volume 20 Issue 4Article No.: 27pp 1–27https://doi.org/10.1145/3418301

Published:14 October 2020Publication History

ACM Transactions on Computing Education

Abstract

Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.

References

Advanced Micro Devices. High Performance Computing. Retrieved April 27, 2017 from http://www.amd.com/en-us/products/graphics/workstation/firepro-remote-graphics/gpu-compute.Google Scholar
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, et al. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183. EECS Department, University of California, Berkeley.Google Scholar
OpenMP Architecture Review Board. 2017. OpenMP. Retrieved April 24, 2017 from http://www.openmp.org//.Google Scholar
Mathias Bourgoin, Emmanuel Chailloux, and Jean-Luc Lamotte. 2017. High level data structures for GPGPU programming in a statically typed language. International Journal of Parallel Programming 45, 2 (2017), 242--261. DOI:https://doi.org/10.1007/s10766-016-0424-7Google ScholarDigital Library
Fernando Castor, João Paulo Oliveira, and André L. M. Santos. 2011. Software transactional memory vs. locking in a functional language: A controlled experiment. In SPLASH ’11 Workshops: Proceedings of the Compilation of the Co-Located Workshops on DSM’11, TMC’11, AGERE! 2011, AOOPES’11, NEAT’11, and VMIL’11. ACM, New York, NY, 117--122.Google Scholar
Michael Coblenz, Robert Seacord, Brad Myers, Joshua Sunshine, and Jonathan Aldrich. 2015. A course-based usability analysis of Cilk Plus and OpenMP. In Proceedings of the 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’15). IEEE, Los Alamitos, CA, 245--249.Google ScholarCross Ref
Intel Corporation. 2017. Intel Cilk Plus. Retrieved April 24, 2017 from https://www.cilkplus.org/.Google Scholar
Intel Corporation. 2017. Intel Threading Building Blocks. Retrieved April 24, 2017 from https://www.threadingbuildingblocks.org/.Google Scholar
Intel Corporation. 2017. Intel Xeon Phi Coprocessors. Retrieved April 27, 2017 from http://www.intel.com/content/www/us/en/products/processors/xeon-phi/xeon-phi-coprocessors.html.Google Scholar
NVIDIA Corporation. 2017. GPU vs CPU? What is GPU Computing. Retrieved April 27, 2017 from http://www.nvidia.com/object/what-is-gpu-computing.html.Google Scholar
NVIDIA Corporation. 2017 (updated March 20, 2017). CUDA Toolkit Documentation v8.0.61. Retrieved April 27, 2017 from https://developer.nvidia.com/cuda-80-ga2-download-archive.Google Scholar
Richard Doll and A. Bradford Hill. 1950. Smoking and carcinoma of the lung. British Medical Journal 2, 4682 (1950), 739.Google ScholarCross Ref
Scott D. Fleming, Eileen Kraemer, R. E. Kurt Stirewalt, and Laura K. Dillon. 2010. Debugging concurrent software: A study using multithreaded sequence diagrams. In Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, Los Alamitos, CA, 33--40.Google Scholar
Scott D. Fleming, Eileen Kraemer, R. E. Kurt Stirewalt, Shaohua Xie, and Laura K. Dillon. 2008. A study of student strategies for the corrective maintenance of concurrent software. In Proceedings of the 30th International Conference on Software Engineering. 759--768.Google Scholar
Evghenii Gaburov. 2016. Quick Start Guide. Retrieved April 27, 2020 from https://github.com/thrust/thrust/wiki/Quick-Start-Guide.Google Scholar
Khronos Group. 2017. OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved April 24, 2017 from http://www.khronos.org/opencl/.Google Scholar
Mark Harris. 2017. An Even Easier Introduction to CUDA. Retrieved April 24, 2017 from https://devblogs.nvidia.com/parallelforall/even-easier-introduction-cuda/.Google Scholar
Jared Hoberock and Nathan Bell. 2015. Thrust--Parallel Algorithms Library. Available at Github.ioGoogle Scholar
Lorin Hochstein, Jeffrey Carver, Forrest Shull, Sima Asgari, Victor Basili, Jeffrey K. Hollingsworth, and Marvin V. Zelkowitz. 2005. Parallel programmer productivity: A case study of novice parallel programmers. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05). IEEE, Los Alamitos, CA, 35--35.Google Scholar
Antti-Juhani Kaijanaho. 2015. Evidence-Based Programming Language Design: A Philosophical and Methodological Exploration. Ph.D. Dissertation. University of Jyvaskyla.Google Scholar
Volodymyr V. Kindratenko, Jeremy J. Enos, Guochun Shi, Michael T. Showerman, Galen W. Arnold, John E. Stone, James C. Phillips, and Wen-Mei Hwu. 2009. GPU clusters for high-performance computing. In Proceedings of the 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER’09). IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
Gary Lewandowski, Dennis J. Bouvier, Robert McCartney, Kate Sanders, and Beth Simon. 2007. Commonsense computing (episode 3): Concurrency and concert tickets. In Proceedings of the 3rd International Workshop on Computing Education Research. ACM, Los Alamitos, CA, 133--144.Google ScholarDigital Library
Lauri Malmi, Judy Sheard, Roman Bednarik, Juha Helminen, Päivi Kinnunen, Ari Korhonen, Niko Myller, et al. 2014. Theoretical underpinnings of computing education research: What is the evidence? In Proceedings of the 10th Annual Conference on International Computing Education Research. ACM, New York, NY, 27--34.Google ScholarDigital Library
David Moher, Kenneth F. Schulz, and Douglas G. Altman. 2001. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. BMC Medical Research Methodology 1 (2001), 2.Google ScholarCross Ref
Sebastian Nanz, Faraz Torshizi, Michela Pedroni, and Bertrand Meyer. 2011. Empirical assessment of languages for teaching concurrency: Methodology and application. In Proceedings of the 2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE8T). IEEE, Los Alamitos, CA, 477--481.Google ScholarDigital Library
Victor Pankratius and Ali-Reza Adl-Tabatabai. 2011. A study of transactional memory vs. locks in practice. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 43--52.Google ScholarDigital Library
Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel. 2010. Is transactional programming actually easier? ACM SIGPLAN Notices 45, 5 (2010), 47--56.Google ScholarDigital Library
Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge. 2014. Programmers’ build errors: A case study (at Google). In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 724--734.Google ScholarDigital Library
Duane Storti and Mete Yurtoglu. 2015. CUDA for Engineers: An Introduction to High-Performance Parallel Computing. Addison-Wesley Professional.Google Scholar
S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W.-M. W. Hwu. 2008. CUDA-Lite: Reducing GPU programming complexity. Lecture Notes in Computer Science, Vol. 5335. Springer, 1--15. DOI:https://doi.org/10.1007/978-3-540-89740-8_1Google Scholar
Scientific Computing World. 2006. Programming Difficulty Is Killing Engineers’ Productivity. Retrieved May 2, 2017 from https://www.scientific-computing.com/news/programming-difficulty-killing-engineers-productivity/.Google Scholar

Index Terms

GPU Programming Productivity in Different Abstraction Paradigms: A Randomized Controlled Trial Comparing CUDA and Thrust

Recommendations

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Application development for modern high-performance systems with graphics processing units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. We present SkelCL--a ...
Read More
Productivity of GPUs under different programming paradigms

Graphical processing units have been gaining rising attention because of their high performance processing capabilities for many scientific and engineering applications. However, programming such highly parallel devices requires adequate programming ...
Read More
Faster GPU-based genetic programming using a two-dimensional stack

Genetic programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards to Graphics Processing Units (GPU). Hence, versions ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computing Education Volume 20, Issue 4
December 2020
146 pages
EISSN:1946-6226
DOI:10.1145/3428081
Editor:
Chris Hundhausen
Washington State University, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2020
- Revised: 1 July 2020
- Accepted: 1 July 2020
- Received: 1 February 2020
Published in toce Volume 20, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computer science education
GPU programming
empirical studies
evaluation
human factors evidence
parallel programming
threads
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)120
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

GPU Programming Productivity in Different Abstraction Paradigms: A Randomized Controlled Trial Comparing CUDA and Thrust

ACM Transactions on Computing Education

Abstract

References

Cited By

Index Terms

Recommendations

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Productivity of GPUs under different programming paradigms

Faster GPU-based genetic programming using a two-dimensional stack