Abstract
Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.
- Advanced Micro Devices. High Performance Computing. Retrieved April 27, 2017 from http://www.amd.com/en-us/products/graphics/workstation/firepro-remote-graphics/gpu-compute.Google Scholar
- Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, et al. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183. EECS Department, University of California, Berkeley.Google Scholar
- OpenMP Architecture Review Board. 2017. OpenMP. Retrieved April 24, 2017 from http://www.openmp.org//.Google Scholar
- Mathias Bourgoin, Emmanuel Chailloux, and Jean-Luc Lamotte. 2017. High level data structures for GPGPU programming in a statically typed language. International Journal of Parallel Programming 45, 2 (2017), 242--261. DOI:https://doi.org/10.1007/s10766-016-0424-7Google ScholarDigital Library
- Fernando Castor, João Paulo Oliveira, and André L. M. Santos. 2011. Software transactional memory vs. locking in a functional language: A controlled experiment. In SPLASH ’11 Workshops: Proceedings of the Compilation of the Co-Located Workshops on DSM’11, TMC’11, AGERE! 2011, AOOPES’11, NEAT’11, and VMIL’11. ACM, New York, NY, 117--122.Google Scholar
- Michael Coblenz, Robert Seacord, Brad Myers, Joshua Sunshine, and Jonathan Aldrich. 2015. A course-based usability analysis of Cilk Plus and OpenMP. In Proceedings of the 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’15). IEEE, Los Alamitos, CA, 245--249.Google ScholarCross Ref
- Intel Corporation. 2017. Intel Cilk Plus. Retrieved April 24, 2017 from https://www.cilkplus.org/.Google Scholar
- Intel Corporation. 2017. Intel Threading Building Blocks. Retrieved April 24, 2017 from https://www.threadingbuildingblocks.org/.Google Scholar
- Intel Corporation. 2017. Intel Xeon Phi Coprocessors. Retrieved April 27, 2017 from http://www.intel.com/content/www/us/en/products/processors/xeon-phi/xeon-phi-coprocessors.html.Google Scholar
- NVIDIA Corporation. 2017. GPU vs CPU? What is GPU Computing. Retrieved April 27, 2017 from http://www.nvidia.com/object/what-is-gpu-computing.html.Google Scholar
- NVIDIA Corporation. 2017 (updated March 20, 2017). CUDA Toolkit Documentation v8.0.61. Retrieved April 27, 2017 from https://developer.nvidia.com/cuda-80-ga2-download-archive.Google Scholar
- Richard Doll and A. Bradford Hill. 1950. Smoking and carcinoma of the lung. British Medical Journal 2, 4682 (1950), 739.Google ScholarCross Ref
- Scott D. Fleming, Eileen Kraemer, R. E. Kurt Stirewalt, and Laura K. Dillon. 2010. Debugging concurrent software: A study using multithreaded sequence diagrams. In Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, Los Alamitos, CA, 33--40.Google Scholar
- Scott D. Fleming, Eileen Kraemer, R. E. Kurt Stirewalt, Shaohua Xie, and Laura K. Dillon. 2008. A study of student strategies for the corrective maintenance of concurrent software. In Proceedings of the 30th International Conference on Software Engineering. 759--768.Google Scholar
- Evghenii Gaburov. 2016. Quick Start Guide. Retrieved April 27, 2020 from https://github.com/thrust/thrust/wiki/Quick-Start-Guide.Google Scholar
- Khronos Group. 2017. OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved April 24, 2017 from http://www.khronos.org/opencl/.Google Scholar
- Mark Harris. 2017. An Even Easier Introduction to CUDA. Retrieved April 24, 2017 from https://devblogs.nvidia.com/parallelforall/even-easier-introduction-cuda/.Google Scholar
- Jared Hoberock and Nathan Bell. 2015. Thrust--Parallel Algorithms Library. Available at Github.ioGoogle Scholar
- Lorin Hochstein, Jeffrey Carver, Forrest Shull, Sima Asgari, Victor Basili, Jeffrey K. Hollingsworth, and Marvin V. Zelkowitz. 2005. Parallel programmer productivity: A case study of novice parallel programmers. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05). IEEE, Los Alamitos, CA, 35--35.Google Scholar
- Antti-Juhani Kaijanaho. 2015. Evidence-Based Programming Language Design: A Philosophical and Methodological Exploration. Ph.D. Dissertation. University of Jyvaskyla.Google Scholar
- Volodymyr V. Kindratenko, Jeremy J. Enos, Guochun Shi, Michael T. Showerman, Galen W. Arnold, John E. Stone, James C. Phillips, and Wen-Mei Hwu. 2009. GPU clusters for high-performance computing. In Proceedings of the 2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER’09). IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
- Gary Lewandowski, Dennis J. Bouvier, Robert McCartney, Kate Sanders, and Beth Simon. 2007. Commonsense computing (episode 3): Concurrency and concert tickets. In Proceedings of the 3rd International Workshop on Computing Education Research. ACM, Los Alamitos, CA, 133--144.Google ScholarDigital Library
- Lauri Malmi, Judy Sheard, Roman Bednarik, Juha Helminen, Päivi Kinnunen, Ari Korhonen, Niko Myller, et al. 2014. Theoretical underpinnings of computing education research: What is the evidence? In Proceedings of the 10th Annual Conference on International Computing Education Research. ACM, New York, NY, 27--34.Google ScholarDigital Library
- David Moher, Kenneth F. Schulz, and Douglas G. Altman. 2001. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. BMC Medical Research Methodology 1 (2001), 2.Google ScholarCross Ref
- Sebastian Nanz, Faraz Torshizi, Michela Pedroni, and Bertrand Meyer. 2011. Empirical assessment of languages for teaching concurrency: Methodology and application. In Proceedings of the 2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE8T). IEEE, Los Alamitos, CA, 477--481.Google ScholarDigital Library
- Victor Pankratius and Ali-Reza Adl-Tabatabai. 2011. A study of transactional memory vs. locks in practice. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 43--52.Google ScholarDigital Library
- Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel. 2010. Is transactional programming actually easier? ACM SIGPLAN Notices 45, 5 (2010), 47--56.Google ScholarDigital Library
- Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge. 2014. Programmers’ build errors: A case study (at Google). In Proceedings of the 36th International Conference on Software Engineering. ACM, New York, NY, 724--734.Google ScholarDigital Library
- Duane Storti and Mete Yurtoglu. 2015. CUDA for Engineers: An Introduction to High-Performance Parallel Computing. Addison-Wesley Professional.Google Scholar
- S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W.-M. W. Hwu. 2008. CUDA-Lite: Reducing GPU programming complexity. Lecture Notes in Computer Science, Vol. 5335. Springer, 1--15. DOI:https://doi.org/10.1007/978-3-540-89740-8_1Google Scholar
- Scientific Computing World. 2006. Programming Difficulty Is Killing Engineers’ Productivity. Retrieved May 2, 2017 from https://www.scientific-computing.com/news/programming-difficulty-killing-engineers-productivity/.Google Scholar
Index Terms
- GPU Programming Productivity in Different Abstraction Paradigms: A Randomized Controlled Trial Comparing CUDA and Thrust
Recommendations
SkelCL: a high-level extension of OpenCL for multi-GPU systems
Application development for modern high-performance systems with graphics processing units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. We present SkelCL--a ...
Productivity of GPUs under different programming paradigms
Graphical processing units have been gaining rising attention because of their high performance processing capabilities for many scientific and engineering applications. However, programming such highly parallel devices requires adequate programming ...
Faster GPU-based genetic programming using a two-dimensional stack
Genetic programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards to Graphics Processing Units (GPU). Hence, versions ...
Comments