skip to main content
research-article
Public Access

GPU Programming Productivity in Different Abstraction Paradigms: A Randomized Controlled Trial Comparing CUDA and Thrust

Published:14 October 2020Publication History
Skip Abstract Section

Abstract

Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.

References

  1. Advanced Micro Devices. High Performance Computing. Retrieved April 27, 2017 from http://www.amd.com/en-us/products/graphics/workstation/firepro-remote-graphics/gpu-compute.Google ScholarGoogle Scholar
  2. Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, et al. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  3. OpenMP Architecture Review Board. 2017. OpenMP. Retrieved April 24, 2017 from http://www.openmp.org//.Google ScholarGoogle Scholar
  4. Mathias Bourgoin, Emmanuel Chailloux, and Jean-Luc Lamotte. 2017. High level data structures for GPGPU programming in a statically typed language. International Journal of Parallel Programming 45, 2 (2017), 242--261. DOI:https://doi.org/10.1007/s10766-016-0424-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fernando Castor, João Paulo Oliveira, and André L. M. Santos. 2011. Software transactional memory vs. locking in a functional language: A controlled experiment. In SPLASH ’11 Workshops: Proceedings of the Compilation of the Co-Located Workshops on DSM’11, TMC’11, AGERE! 2011, AOOPES’11, NEAT’11, and VMIL’11. ACM, New York, NY, 117--122.Google ScholarGoogle Scholar
  6. Michael Coblenz, Robert Seacord, Brad Myers, Joshua Sunshine, and Jonathan Aldrich. 2015. A course-based usability analysis of Cilk Plus and OpenMP. In Proceedings of the 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’15). IEEE, Los Alamitos, CA, 245--249.Google ScholarGoogle ScholarCross RefCross Ref
  7. Intel Corporation. 2017. Intel Cilk Plus. Retrieved April 24, 2017 from https://www.cilkplus.org/.Google ScholarGoogle Scholar
  8. Intel Corporation. 2017. Intel Threading Building Blocks. Retrieved April 24, 2017 from https://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  9. Intel Corporation. 2017. Intel Xeon Phi Coprocessors. Retrieved April 27, 2017 from http://www.intel.com/content/www/us/en/products/processors/xeon-phi/xeon-phi-coprocessors.html.Google ScholarGoogle Scholar
  10. NVIDIA Corporation. 2017. GPU vs CPU? What is GPU Computing. Retrieved April 27, 2017 from http://www.nvidia.com/object/what-is-gpu-computing.html.Google ScholarGoogle Scholar
  11. NVIDIA Corporation. 2017 (updated March 20, 2017). CUDA Toolkit Documentation v8.0.61. Retrieved April 27, 2017 from https://developer.nvidia.com/cuda-80-ga2-download-archive.Google ScholarGoogle Scholar
  12. Richard Doll and A. Bradford Hill. 1950. Smoking and carcinoma of the lung. British Medical Journal 2, 4682 (1950), 739.Google ScholarGoogle ScholarCross RefCross Ref
  13. Scott D. Fleming, Eileen Kraemer, R. E. Kurt Stirewalt, and Laura K. Dillon. 2010. Debugging concurrent software: A study using multithreaded sequence diagrams. In Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, Los Alamitos, CA, 33--40.Google ScholarGoogle Scholar
  14. Scott D. Fleming, Eileen Kraemer, R. E. Kurt Stirewalt, Shaohua Xie, and Laura K. Dillon. 2008. A study of student strategies for the corrective maintenance of concurrent software. In Proceedings of the 30th International Conference on Software Engineering. 759--768.Google ScholarGoogle Scholar
  15. Evghenii Gaburov. 2016. Quick Start Guide. Retrieved April 27, 2020 from https://github.com/thrust/thrust/wiki/Quick-Start-Guide.Google ScholarGoogle Scholar
  16. Khronos Group. 2017. OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved April 24, 2017 from http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  17. Mark Harris. 2017. An Even Easier Introduction to CUDA. Retrieved April 24, 2017 from https://devblogs.nvidia.com/parallelforall/even-easier-introduction-cuda/.Google ScholarGoogle Scholar
  18. Jared Hoberock and Nathan Bell. 2015. Thrust--Parallel Algorithms Library. Available at Github.ioGoogle ScholarGoogle Scholar
  19. Lorin Hochstein, Jeffrey Carver, Forrest Shull, Sima Asgari, Victor Basili, Jeffrey K. Hollingsworth, and Marvin V. Zelkowitz. 2005. Parallel programmer productivity: A case study of novice parallel programmers. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05). IEEE, Los Alamitos, CA, 35--35.Google ScholarGoogle Scholar
  20. Antti-Juhani Kaijanaho. 2015. Evidence-Based Programming Language Design: A Philosophical and Methodological Exploration. Ph.D. Dissertation. University of Jyvaskyla.Google ScholarGoogle Scholar
  21. Volodymyr V. Kindratenko, Jeremy J. Enos, Guochun Shi, Michael T. Showerman, Galen W. Arnold, John E. Stone, James C. Phillips, and Wen-Mei Hwu. 2009. GPU clusters for high-performance computing. In Proceedings of the 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER’09). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  22. Gary Lewandowski, Dennis J. Bouvier, Robert McCartney, Kate Sanders, and Beth Simon. 2007. Commonsense computing (episode 3): Concurrency and concert tickets. In Proceedings of the 3rd International Workshop on Computing Education Research. ACM, Los Alamitos, CA, 133--144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lauri Malmi, Judy Sheard, Roman Bednarik, Juha Helminen, Päivi Kinnunen, Ari Korhonen, Niko Myller, et al. 2014. Theoretical underpinnings of computing education research: What is the evidence? In Proceedings of the 10th Annual Conference on International Computing Education Research. ACM, New York, NY, 27--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Moher, Kenneth F. Schulz, and Douglas G. Altman. 2001. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. BMC Medical Research Methodology 1 (2001), 2.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sebastian Nanz, Faraz Torshizi, Michela Pedroni, and Bertrand Meyer. 2011. Empirical assessment of languages for teaching concurrency: Methodology and application. In Proceedings of the 2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE8T). IEEE, Los Alamitos, CA, 477--481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Victor Pankratius and Ali-Reza Adl-Tabatabai. 2011. A study of transactional memory vs. locks in practice. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 43--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel. 2010. Is transactional programming actually easier? ACM SIGPLAN Notices 45, 5 (2010), 47--56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge. 2014. Programmers’ build errors: A case study (at Google). In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 724--734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Duane Storti and Mete Yurtoglu. 2015. CUDA for Engineers: An Introduction to High-Performance Parallel Computing. Addison-Wesley Professional.Google ScholarGoogle Scholar
  30. S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W.-M. W. Hwu. 2008. CUDA-Lite: Reducing GPU programming complexity. Lecture Notes in Computer Science, Vol. 5335. Springer, 1--15. DOI:https://doi.org/10.1007/978-3-540-89740-8_1Google ScholarGoogle Scholar
  31. Scientific Computing World. 2006. Programming Difficulty Is Killing Engineers’ Productivity. Retrieved May 2, 2017 from https://www.scientific-computing.com/news/programming-difficulty-killing-engineers-productivity/.Google ScholarGoogle Scholar

Index Terms

  1. GPU Programming Productivity in Different Abstraction Paradigms: A Randomized Controlled Trial Comparing CUDA and Thrust

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computing Education
          ACM Transactions on Computing Education  Volume 20, Issue 4
          December 2020
          146 pages
          EISSN:1946-6226
          DOI:10.1145/3428081
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 October 2020
          • Revised: 1 July 2020
          • Accepted: 1 July 2020
          • Received: 1 February 2020
          Published in toce Volume 20, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format