Skip to main content
Log in

Development of benchmark automation suite and evaluation of various high-performance computing systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This study aimed to develop a dynamic benchmark automation suite to measure a range of benchmark performances and evaluate the various high-performance computing (HPC) systems. Our suite supports an automated scaling test and profiling data based on hardware performance counters to analyze the system characteristics. We selected four HPC benchmarks—STREAM, High-Performance Linpack, High-Performance Conjugate Gradient, and NAS Parallel Benchmark-for experiments and configured testbeds based on five different systems and an Intel Knights Landing (KNL) cluster with 16 nodes. The Intel KNL system showed both unstable memory and high benchmark performances for a specific input range. Modern Intel systems also exhibited proper characteristics on compute-intensive workloads, whereas the up-to-date AMD system showed high efficiency and proper characteristics on memory-intensive and real application workloads. We also verified that each system has an optimal environment and characteristic for various combinations of experimental variables and profiling data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Theis, T.N., Wong, H.-S.P.: The end of Moore’s Law: a new beginning for information technology. Comput. Sci. Eng. 19, 41–50 (2017)

    Article  Google Scholar 

  2. Sodani, A.: Knights Landing (KNL): 2nd Generation Intel\(\textregistered\) Xeon Phi processor. In: Hot Chips 27 Symposium (HCS) IEEE (2015)

  3. Intel Xeon Scalable Processors. https://ark.intel.com/ko/products/series/125191/. Accessed 26 Sept. (2019)

  4. Doweck, J., Kao, W.F., Lu, A.K.-Y., Mandelblat, J., Rahatekar, A., Rappoport, L., Rotem, E., Yasin, A., Yoaz, A.: Inside 6th-generation Intel core: new microarchitecture code-named Skylake. IEEE Micro 37, 52–62 (2017)

    Article  Google Scholar 

  5. White, S.: High-performance power-efficient x86-64 server and desktop processors using the core codenamed bulldozer. In: Hot Chips 23 Symposium (HCS) IEEE (2011)

  6. Lepak, K., Talbot, G., White, S., Beck, N., Naffziger, S.: The next generation AMD enterprise server product architecture. In: Hot Chips 29 Symposium (HCS) IEEE (2017)

  7. Beck, N., White, S., Paraschou, M., Naffziger, S.: ‘Zeppelin’: an SoC for multichip architectures. IEEE ISSCC Dig. Tech., pp. 40–42 (2018)

  8. McCalpin, J.D.: STREAM: Sustainable Memory Bandwidth in High-Performance Computers. A continually updated technical report (1991–2007). http://www.cs.virginia.edu/stream/. Accessed 20 July 2019

  9. Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK benchmark: past, present, and future. Concurr. Comput. 15(9), 803–820 (2003)

    Article  Google Scholar 

  10. Dongarra, J., Heroux, M.: Toward a new metric for ranking high-performance computing systems. Sandia National Laboratory. Tech. Rep. 4744 (2013)

  11. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks, summary and preliminary results. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 158–165 (1991)

  12. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA (1996)

    MATH  Google Scholar 

  13. Kumar, S.: A benchmark suite for evaluating configurable computing systems-status, reflections, and future directions. In: FPGA2000 Eighth International Symposium on FPGAs (2000)

  14. Jamieson, P., Becker, T., Luk, W., Cheung, P., Rissa, T.: Benchmarking reconfigurable architectures in the mobile domain. In: Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (2009)

  15. Jamieson, P., Sanaullah, A., Herbordt, M.: Benchmarking heterogeneous HPC systems including reconfigurable fabrics: community aspirations for ideal comparisons. In: 2018 IEEE High-Performance Extreme Computing Conference (HPEC), pp. 1–6 (2018)

  16. Ahmed, A., Skadron, K.: Hopscotch: A micro-benchmark suite for memory performance evaluation. In: MEMSYS ’19: Proceedings of the International Symposium on Memory Systems, pp. 67–172 (2019)

  17. Luszczek, P., Dongarra, J.J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., Mccalpin, J., Bailey, D., Takahashi, D.: Introduction to the HPC challenge benchmark suite. Lawrence Berkeley National Laboratory Tech, Rep (2005)

  18. Dongarra, J., Luszczek, P., Vetter, J.: HPC challenge: Design history and implementation highlights. In: Contemporary High-Performance Computing: From Petascale Toward Exascale. CRC Computational Science Series. Taylor and Francis, Boca Raton, FL (2013)

  19. Larabel, M., Tippett, M.: Phoronix test suite. https://www.phoronix-test-suite.com/. Accessed 26 Sept 2019

  20. Hanussek, M., Bartusch, F., Kruger, J.: BOOTABLE: bioinformatics benchmark tool suite for applications and hardware. Future Gener. Comput. Syst. 102, 1016–1026 (2020)

    Article  Google Scholar 

  21. Bader D., Li, Y., Li, T., Sachdeva, V.: BioPerf: A benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (2005)

  22. Albayraktaroglu, K., Jaleel, A., Wu, X., Franklin, M., Jacob, B., Tseng, C., Yeung, D.: BioBench: a benchmark suite of bioinformatics applications. In: Proceedings of the 5th International Symposium on Performance Analysis of Systems and Software (ISPASS) (2005)

  23. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (2008)

  24. Dixit, K.: The SPEC benchmarks. Parallel Comput. 17(10–11), 1195–1209 (1991)

    Article  Google Scholar 

  25. Sharkawi, S., DeSota, D., Panda, R., Indukuru, R., Stevens, S., Taylor. V., Wu, X.: Performance projection of HPC applications using SPEC CFP2006 benchmarks. In: Proceedings of the International Parallel and Distributed Processing Symposium, ser. IPDPS ’09 (2009)

  26. Turner, A.: UK National HPC Benchmarks. Technical report, UK National Supercomputing Service ARCHER (2016)

  27. Danalis, A., Marin, G., McCurdy. C., Meredith, J.S., Roth, P.C., Spafford. K., Tipparaju. V., Vetter, J.S.: The Scalable Heterogeneous Computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74 (2010)

  28. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.-H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proc. of Int’l Symp. on Workload Characterization (IISWC 2009), pp. 44–54 (2009)

  29. Adams, M. F., Brown, J., Shalf, J., Straalen, B.V., Strohmaier, E., Williams, S.: HPGMG 1.0: A benchmark for ranking high performance computing systems. Lawrence Berkley National Laboratory Tech. Rep., LBNL-6630E (2014)

  30. Marjanovic, V., Gracia, J., Glass, C.W.: HPC benchmarking: problem size matters. In: Proceedings of 7th International Workshop on Performance Modeling Benchmarking and Simulation of High-Performance Computer Systems, pp. 1–10 (2016)

  31. Xing, F., You, H., Lu, C.: HPC benchmark assessment with statistical analysis. Procedia Comput. Sci. 29, 210–219 (2014)

    Article  Google Scholar 

  32. Mehrotra, P., Djomehri, J., Heistand, S., Hood, R., Jin, H., Lazanoff, A., Saini, S., Biswas, R.: Performance evaluation of Amazon EC2 for NASA HPC applications. In: Proceedings of the 3rd Workshop on Scientific Cloud Computing Date—ScienceCloud’12, pp. 41–50 (2012)

  33. Laukemann, J., Hammer, J., Hofmann, J., Hager, G., Wellein, G.: Automated instruction stream throughput prediction for Intel and AMD microarchitectures. In: IEEE/ACM Performance Modeling Benchmarking and Simulation of High Performance Computer Systems (2018)

  34. Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engineers, 1st edn. CRC Press Inc, Boca Raton (2010)

    Book  Google Scholar 

  35. Stegailov, V., Vecher, V.: Efficiency analysis of intel and AMD x86\_64 architectures for Ab initio calculations: a case study of VASP. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2017. CCIS, vol. 793, pp. 430–441. (2017). https://doi.org/10.1007/978-3-319-71255-0_35

  36. Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996). https://doi.org/10.1103/PhysRevB.54.11169

    Article  Google Scholar 

  37. Alamsyah, M., Utomo, A., Gunawan, P.: Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS. J. Phys. Conf. Ser. 971(1), 012022 (2018)

    Article  Google Scholar 

  38. Koshizuka, S., Oka, Y.: Moving particle semi-implicit method for fragmentation of incompressible fluid. Nucl. Sci. Eng. 123(3), 421–434 (1996)

    Article  Google Scholar 

  39. Molka, D., Hackenberg, D., Schöne, R.: Main memory and cache performance of Intel Sandy Bridge and AMD Bulldozer. Mem. Syst. Perf. Corr. 4, 1–4 (2014)

    Google Scholar 

  40. Juckeland, G., Kluge, M., Nagel, W.E., Pflüger, S.: Performance analysis with BenchIT: portable, flexible, easy to use. In: Proceedings of the International Conference on Quantitative Evaluation of Systems, pp. 320–321 (2004)

  41. Prakash, T.K., Peng, L.: Performance characterization of SPEC CPU2006 benchmarks on Intel core 2 duo processor. ISAST Trans. Comput. Softw. Eng. 2(1), 36–41 (2008)

    Google Scholar 

  42. Mantovani, F., Garcia, G.M., Gracia, J., Stafford, E., Banchelli, F., Josep, F.M., Criado, L.J., Nachtmann, M.: Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU. Future Gener. Comp. Syst. 112, 800–818 (2020)

    Article  Google Scholar 

  43. Kaliszan, D., Meyer, N., Petruczynik, S., Gienger, M., Gogolenko, S.: HPC processors benchmarking assessment for global system science applications. Supercomput. Front. Innov. 6(2), 12–28 (2019)

    Google Scholar 

  44. Jarus, M., Varette, S., Oleksiak, A., Bouvry, P.: Performance evaluation and energy efficiency of high-density HPC platforms based on Intel, AMD and ARM processors. In Energy Efficiency in Large Scale Distributed Systems. Ser. Lect. Notes Comput. Sci., vol. 8046, pp. 182–200 (2013)

  45. Treibig, J., Hager, G., Wellein, G.: Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In: Parallel Processing Workshops (ICPPW), pp. 207–216 (2010)

  46. likwid-powermeter. Tool for accessing RAPL counters on Intel processor. https://github.com/RRZE-HPC/likwid/wiki/Likwid-Powermeter. Accessed 15 Feb 2020

  47. Park, G., Rho, S., Kim, J.-S., Nam, D.: Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Cluster Comput. 22, 121–133 (2019)

    Article  Google Scholar 

  48. Choi, J., Park, G., Nam, D.: Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining. Cluster Comput. 23, 57–69 (2020)

    Article  Google Scholar 

  49. Bitzes, G., Nowak, A.: The overhead of profiling using PMU hardware counters. CERN openlab report (2014)

  50. Raman, K.: Optimizing memory bandwidth in Knights Landing on STREAM Triad. https://software.intel.com/en-us/articles/optimizing-memory-bandwidth-in-knights-landing-on-stream-triad. Accessed 26 Sept (2019)

  51. Introducing the Intel®Threading Building Blocks (Intel®TBB). https://software.intel.com/en-us/node/725943. Accessed 26 Sept 2019

  52. Developer guide for Intel®Math Kernel Library for Linux. https://software.intel.com/en-us/mkl-linux-developer-guide-getting-started-with-intel-optimized-hpcg. Accessed 26 Sept 2019

  53. Pasquale, J., Bittel, B., Kraiman, D.: A static and dynamic workload characterization study of the San Diego Supercomputer Center CRAY X-MP. In: Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (1991)

  54. Brandon, C., Thorsten, K., Brian, A., Samuel, W., Jack, D.: Performance variability on Xeon Phi. In: International Conference on High-Performance Computing, pp. 419–429 (2017)

Download references

Acknowledgements

This work was partly supported by Creative Allied Project (CAP) Grant funded by the Korean government (MSIT) (No. G-19-GT-CU01) and Korea Institute of Science and Technology Information (KISTI) Grant (No. K-19-L02-C06-S01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chan-Yeol Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rho, S., Park, G., Choi, J.E. et al. Development of benchmark automation suite and evaluation of various high-performance computing systems. Cluster Comput 24, 159–179 (2021). https://doi.org/10.1007/s10586-020-03167-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-020-03167-2

Keywords

Navigation