Skip to main content
Log in

clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Heterogeneous cluster systems consisting of CPUs and different kinds of accelerators have become mainstream in HPC. Programming such systems is a difficult task and requires addressing manifold challenges that stem from the intricate composition of such systems and peculiarities of scientific applications. A broad range of obstacles preventing efficient execution have to be considered and dealt with properly. In this paper, we propose a systematic approach and a framework that is capable of providing comprehensive support for running data-parallel applications in heterogeneous asymmetric clusters. Our implementation provides work partitioning and distribution by ensuring workload balance in the cluster while handling of partitioning-induced communication and synchronization in a transparent way. In our experimental section, we choose 11 representative scientific applications from different domains to evaluate our approach. Experimental results show a strong speedup and workload balance for different cluster configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alves A, Rufino J, Pina A, Santos L (2013) clOpenCL—supporting distributed heterogeneous computing in HPC clusters. In: Euro-Par 2012: Parallel Processing Workshops, LNCS, vol 7640. Springer, Berlin, pp 112–122

  2. AMD: AMD accelerated parallel processing SDK. https://developer.amd.com/tools-and-sdks/. Accessed May 2019

  3. Aoki R, Oikawa S, Nakamura T, Miki S (2011) Hybrid OpenCL: enhancing OpenCL for distributed processing. In: International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp 149–154

  4. Augonnet Cedric, Thibault Samuel, Namyst Raymond, Wacrenier Pierre-Andre (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198

    Article  Google Scholar 

  5. Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE Conference on Cluster Computing Workshops and Posters, pp 1–7

  6. Beaumont O, Becker BA, DeFlumere A, Eyraud-Dubois L, Lambert T, Lastovetsky A (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229

    Article  Google Scholar 

  7. Beaumont O, Boudet V, Rastello F, Robert Y (2002) Partitioning a square into rectangles: Np-completeness and approximation algorithms. Algorithmica 34(3):217–239

    Article  MathSciNet  Google Scholar 

  8. Coti C, Herault T, Lemarinier P, Pilard L, Rezmerita A, Rodriguezb E, Cappello F (2006) Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI. In: SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p 18

  9. Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Fut Gener Comput Syst 22(3):303–312

    Article  Google Scholar 

  10. Diop T, Gurfinkel S, Anderson J, Jerger NE (2013) Distcl: a framework for the distributed execution of opencl kernels. In: 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp 556–566

  11. Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: HPCS, pp 224–231

  12. Eskikaya B, Altilar DT (2012) Distributed OpenCL distributing OpenCL platform on network scale. Int J Comput Appl ACCTHPCA(2):25–30

    Google Scholar 

  13. Gentzsch V (2001) Sun grid engine: towards creating a compute power grid. In: Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp 35–36

  14. Grasso I, Pellegrini S, Cosenza B, Fahringer T (2014) A uniform approach for programming distributed heterogeneous computing systems. J Parallel Distrib Comput 74(12):3228–3239 Domain-specific languages and high-level frameworks for high-performance computing

    Article  Google Scholar 

  15. Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: 2012 Innovative Parallel Computing (InPar), pp 1–10

  16. Grewe D, O’Boyle MFP (2011) A static task partitioning approach for heterogeneous systems using opencl. In: Knoop J (ed) Compiler construction. Springer, Berlin, pp 286–305

    Chapter  Google Scholar 

  17. Gschwandtner P, Durillo JJ, Fahringer T (2014) Multi-objective auto-tuning with insieme: optimization and trade-off analysis for time, energy and resource usage. In: Euro-Par 2014 Parallel Processing, LNCS, vol 8632. Springer, Berlin, pp 87–98

  18. Kegel P, Steuwer M, Gorlatch S (2012) dOpenCL: towards a uniform programming approach for distributed heterogeneous multi-/many-core systems. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp 174–186

  19. Khronos OpenCL Working Group: The OpenCL C Specification. Version 2.0. http://www.khronos.org/opencl. Accessed May 2019

  20. Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM International Conference on Supercomputing (ICS). San Servolo Island, Venice, Italy, pp 341–352

  21. Lee J, Samadi M, Mahlke S (2015) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp 355–366

  22. Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT’13. IEEE Press, Piscataway, NJ, USA, pp 245–256

  23. Louis-Noel Pouchet: PolyBench/GPU. http://web.cse.ohio-state.edu/~pouchet.2/software/polybench/ (2012). Accessed May 2019

  24. Luk CK, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 45–55

  25. MPI Forum: MPI: a message-passing interface standard, Version 3.0. http://www.mpi-forum.org. Accessed May 2019

  26. nVidia: CUDA zone. https://developer.nvidia.com/cuda-zone. Accessed May 2019

  27. nVidia: OpenCL SDK. https://developer.nvidia.com/opencl. Accessed May 2019

  28. OpenACC: OpenACC Standard. https://www.openacc.org/. Accessed May 2019

  29. OpenMP: OpenMP specification. https://www.openmp.org/. Accessed May 2019

  30. Planas J, Badia RM, Ayguadé E, Labarta J (2013) Self-adaptive OMPSS tasks in heterogeneous environments. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp 138–149

  31. Raca V, Mehofer E (2015) Device-sensitive framework for handling heterogeneous asymmetric clusters efficiently. In: 26th IEEE International Symposium on Computer Architecture and High Performance Computing. Florianopolis, Brazil, pp 181–188

  32. Raca V, Mehofer E, Hudec M (2016) Optimal time and energy efficient work distributions in heterogeneous systems. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). Heraklion, Greece, pp 440–447

  33. Rasch A, Bigge J, Wrodarczyk M, Schulze R, Gorlatch S (2019) dOCAL: high-level distributed programming with OpenCL and CUDA. J Supercomput

  34. Shen J, Varbanescu AL, Lu Y, Zou P, Sips H (2016) Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 27(9):2766–2780

    Article  Google Scholar 

  35. Shen J, Varbanescu AL, Martorell X, Sips H (2015) Matchmaking applications and partitioning strategies for efficient execution on heterogeneous platforms. In: 2015 44th International Conference on Parallel Processing, pp 560–569

  36. Steuwer M, Kegel P, Gorlatch S (2011) Skelcl—a portable skeleton library for high-level gpu programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1176–1182

  37. Top500: Top 500 list. https://www.top500.org/lists/2019/06. Accessed July 2019

  38. Wen Y, Wang Z, O’Boyle MFP (2014) Smart multi-task scheduling for openCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC), pp 1–10

  39. Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 44–60

    Chapter  Google Scholar 

  40. Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17(9):530–531

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Austrian Science Fund (FWF) Project P 29783 Dynamic Runtime System for Future Parallel Architectures.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valon Raca.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raca, V., Mehofer, E. clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters. J Supercomput 76, 9976–10008 (2020). https://doi.org/10.1007/s11227-020-03234-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03234-w

Keywords

Navigation