The MVAPICH project: Transforming research into high-performance MPI library for HPC community

https://doi.org/10.1016/j.jocs.2020.101208Get rights and content

Highlights

  • Brief overview of the MVAPICH project and its growth during the last 19 years.

  • Research Innovation of the MVAPICH project for High-Performance Computing during the last 19 years.

  • An in-depth description of the translational process involved in the design, development, and deployment of the production quality MPI libraries.

  • Impact and lessons learned from the project.

Abstract

High-Performance Computing (HPC) research, from hardware and software to the end applications, provides remarkable computing power to help scientists solve complex problems in science, engineering, or even daily business. Over the last decades, Message Passing Interface (MPI) libraries have been powering numerous scientific applications, such as weather forecasting, earthquake simulations, physic, and chemical simulations, to conduct ever-large-scale experiments. However, it is always challenging to practice HPC research. In this article, we use the well-established MVAPICH project to discuss how researchers can transform the HPC research into a high-performance MPI library and translate it to the HPC community.

Introduction

The field of High-Performance Computing (HPC) has been seeing steady growth during the last 25 years. For example, based on the TOP500 list [1], the number one Supercomputer in 1995 used to deliver only 170 GFlops/s. In the latest June 2020 list, the number one supercomputer delivers 415.5 PFlops/s. The world is also heading into the ExaFlop era soon [2]. Such progress has been made possible by design and development in hardware and software technologies as well as applications.

Most of the parallel applications, during the last 25 years, continue to use Message Passing Interface (MPI) libraries conforming to the MPI Standard [3]. As the underlying hardware technologies (processor and networking) continue to evolve, it is the responsibility of the MPI library to extract and deliver performance, scalability, and fault-tolerance to the parallel applications. Thus, designing a high-performance, scalable, fault-tolerant, and production quality MPI library is important to the progress of the HPC field.

Section snippets

Overview of the MVAPICH project and its evolution

HPC systems in the late nineties were using proprietary networking technologies like Myrinet and Quadrics. A new open-standard networking technology called InfiniBand [4] was introduced in October 2000 for datacenters. However, there was no MPI library available to take advantage of this technology for HPC systems. The MVAPICH Project [5] at the Ohio State University took a giant step in 2001 to design an MPI library to exploit the advanced features of the InfiniBand networking technology. The

Research innovation of MVAPICH project for HPC

The research of MVAPICH project is primarily driven by the developments in the HPC community as summarized in Fig. 1. The innovations of MVAPICH project can be roughly classified into four categories: (1) Enabling new programming models for HPC, (2) Leveraging cutting-edge software/hardware technology, (3) Designing High-performance MPI communication middleware, and (4) Powering novel scientific applications using MPI. As shown in Fig. 1, the MVAPICH project was launched in 2001 to address the

Translation process

Fig. 2 depicts the high-level method we follow to perform the various translational research activities are undertaken. To transform our research into the high-performance MPI library and translate it to the HPC community, the MVAPICH project employs research, development, and release cycle as depicted in Fig. 3 [29]. Four primary phases involved in this process are involved as follows. These phases have been getting repeated over the years as computing and networking technologies and

Impact and lessons learned

As of August 2020, the MVAPICH2 software libraries are being used by more than 3100 organizations in 89 countries. The list of registered organizations (in a voluntary manner) is available from the ‘users’ tab of the project's website [5]. Furthermore, more than 810,000 downloads have taken place from the project site. This software is also being distributed by many vendors as part of their software distributions. MVAPICH2 software is also powering many top supercomputers, including the 4th

Conclusion

In this paper, we discuss how high-impact computer science research can be translated into production-quality software for the community. The MVAPICH project, an academic project sustained during the last 19 years, has been successfully transformed into a production-quality high-performance MPI library for the HPC community. The project involves the standard research and development process, release cycle, participation in open source communities and deployment on production HPC clusters. As a

Authors’ contributions

Conception and design of study: D.K. Panda, H. Subramoni, C.-H. Chu, M. Bayatpour; acquisition of data: D.K. Panda, H. Subramoni, analysis and/or interpretation of data: C.-H. Chu, M. Bayatpour

Drafting the manuscript: C.-H. Chu, M. Bayatpour, H. Subramoni, D.K. Panda revising the manuscript critically for important intellectual content: C.-H. Chu, M. Bayatpour, H. Subramoni, D.K. Panda

Acknowledgment

This research is supported in part by NSF grants #1931537, #1450440, #1664137, #1818253, and XRAC grant #NCR-130002.

Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3100 organizations worldwide (in 89 countries). As of

References (33)

  • H. Meuer et al.

    TOP 500 Supercomputer Sites

    (2020)
  • J. Dongarra et al.

    Race to exascale

    Comput. Sci. Eng.

    (2019)
  • Message Passing Interface Forum

    MPI: A Message-Passing Interface Standard

    (1994)
  • InfiniBand Trade Association, http://www.infinibandta.com...
  • Network-Based Computing Laboratory

    MVAPICH: MPI Over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE

    (2001)
  • J. Liu et al.

    High performance RDMA-based MPI implementation over InfiniBand

    Int. J. Parallel Program.

    (2004)
  • Network-Based Computing Laboratory

    NOWLABL::Publications

    (2001)
  • W. Huang et al.

    Design of high performance MVAPICH2: MPI2 over InfiniBand

    Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06), vol. 1

    (2006)
  • J. Jose et al.

    Unifying UPC and MPI runtimes: experience with MVAPICH

    Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

    (2010)
  • H. Wang et al.

    GPU-Aware MPI on RDMA-Enabled clusters: design, implementation and evaluation

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • S. Chakraborty et al.

    Designing scalable and high-performance MPI libraries on Amazon elastic fabric adapter

    2019 IEEE Symposium on High-Performance Interconnects (HOTI)

    (2019)
  • J.M. Hashmi et al.

    Designing efficient shared address space reduction collectives for multi-/many-cores

    2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

    (2018)
  • S. Chakraborty et al.

    Contention-aware Kernel-assisted MPI collectives for multi-/many-core systems

    2017 IEEE International Conference on Cluster Computing (CLUSTER)

    (2017)
  • J.M. Hashmi et al.

    FALCON: efficient designs for zero-copy MPI datatype processing on emerging architectures

    2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

    (2019)
  • H. Subramoni et al.

    Designing dynamic and adaptive MPI point-to-point communication protocols for efficient overlap of computation and communication

    High Performance Computing

    (2017)
  • H. Subramoni et al.

    Designing MPI library with dynamic connected transport (DCT)of InfiniBand: early experiences

    Supercomputing

    (2014)
  • Cited by (45)

    • Understanding hot interconnects with an extensive benchmark survey

      2022, BenchCouncil Transactions on Benchmarks, Standards and Evaluations
    • Implementation-Oblivious Transparent Checkpoint-Restart for MPI

      2023, ACM International Conference Proceeding Series
    • Toward Sustainable HPC: Carbon Footprint Estimation and Environmental Implications of HPC Systems

      2023, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
    View all citing articles on Scopus

    Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3100 organizations worldwide (in 89 countries). As of August’20, more than 810,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 8th, 12th, 18th, and 19th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group (http://hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 330 organizations in 36 countries. As of August’20, more than 37,100 downloads of these libraries have taken place. MPI-driven approaches to achieve high-performance and scalable versions of Deep Learning frameworks (TensorFlow, PyTorch and MXNet) are available from https://hidl.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/∼panda.

    Hari Subramoni received the Ph.D. degree in Computer Science from The Ohio State University, Columbus, OH, in 2013. He is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, and cloud computing. He has published over 50 papers in international journals and conferences related to these research areas. Recently, Dr. Subramoni is doing research and working on the design and development of MVAPICH2, MVAPICH2-GDR, and MVAPICH2-X software packages. He is a member of IEEE. More details about Dr. Subramoni are available from: http://www.cse.ohio-state.edu/∼subramon.

    Ching-Hsiang Chu received the Ph.D. degree in Computer Science and Engineering from The Ohio State University, Columbus, Ohio, USA. He received the BS and MS degrees in computer science and information engineering from the National Changhua University of Education, Taiwan, and the National Central University, Taiwan, respectively. His research interests include High-Performance Computing, GPU communication, and wireless networks. He has authored or co-authored over 30 papers in conferences and journals related to these research areas. More details about Dr. Chu are available from: https://kingchc.gitlab.io.

    Mohammadreza Bayatpour is a Ph.D. candidate in Computer Science and Engineering from The Ohio State University, Columbus, Ohio, USA. He received the BS degrees in computer engineering from the Sharif University of Technology in Tehran, Iran, and joined the Network-Based Computing Lab at The Ohio State University in 2015. His research interests include High-Performance Computing and parallel computer architecture. He has authored or co-authored over 15 papers in conferences and journals related to these research areas. More details are available from http://web.cse.ohio-state.edu/∼bayatpour.1/.

    View full text