The MVAPICH project: Transforming research into high-performance MPI library for HPC community

doi:10.1016/j.jocs.2020.101208

Journal of Computational Science

Volume 52, May 2021, 101208

https://doi.org/10.1016/j.jocs.2020.101208 Get rights and content

Highlights

•
Brief overview of the MVAPICH project and its growth during the last 19 years.
•
Research Innovation of the MVAPICH project for High-Performance Computing during the last 19 years.
•
An in-depth description of the translational process involved in the design, development, and deployment of the production quality MPI libraries.
•
Impact and lessons learned from the project.

Abstract

High-Performance Computing (HPC) research, from hardware and software to the end applications, provides remarkable computing power to help scientists solve complex problems in science, engineering, or even daily business. Over the last decades, Message Passing Interface (MPI) libraries have been powering numerous scientific applications, such as weather forecasting, earthquake simulations, physic, and chemical simulations, to conduct ever-large-scale experiments. However, it is always challenging to practice HPC research. In this article, we use the well-established MVAPICH project to discuss how researchers can transform the HPC research into a high-performance MPI library and translate it to the HPC community.

Introduction

The field of High-Performance Computing (HPC) has been seeing steady growth during the last 25 years. For example, based on the TOP500 list [1], the number one Supercomputer in 1995 used to deliver only 170 GFlops/s. In the latest June 2020 list, the number one supercomputer delivers 415.5 PFlops/s. The world is also heading into the ExaFlop era soon [2]. Such progress has been made possible by design and development in hardware and software technologies as well as applications.

Most of the parallel applications, during the last 25 years, continue to use Message Passing Interface (MPI) libraries conforming to the MPI Standard [3]. As the underlying hardware technologies (processor and networking) continue to evolve, it is the responsibility of the MPI library to extract and deliver performance, scalability, and fault-tolerance to the parallel applications. Thus, designing a high-performance, scalable, fault-tolerant, and production quality MPI library is important to the progress of the HPC field.

Section snippets

Overview of the MVAPICH project and its evolution

HPC systems in the late nineties were using proprietary networking technologies like Myrinet and Quadrics. A new open-standard networking technology called InfiniBand [4] was introduced in October 2000 for datacenters. However, there was no MPI library available to take advantage of this technology for HPC systems. The MVAPICH Project [5] at the Ohio State University took a giant step in 2001 to design an MPI library to exploit the advanced features of the InfiniBand networking technology. The

Research innovation of MVAPICH project for HPC

The research of MVAPICH project is primarily driven by the developments in the HPC community as summarized in Fig. 1. The innovations of MVAPICH project can be roughly classified into four categories: (1) Enabling new programming models for HPC, (2) Leveraging cutting-edge software/hardware technology, (3) Designing High-performance MPI communication middleware, and (4) Powering novel scientific applications using MPI. As shown in Fig. 1, the MVAPICH project was launched in 2001 to address the

Translation process

Fig. 2 depicts the high-level method we follow to perform the various translational research activities are undertaken. To transform our research into the high-performance MPI library and translate it to the HPC community, the MVAPICH project employs research, development, and release cycle as depicted in Fig. 3 [29]. Four primary phases involved in this process are involved as follows. These phases have been getting repeated over the years as computing and networking technologies and

Impact and lessons learned

As of August 2020, the MVAPICH2 software libraries are being used by more than 3100 organizations in 89 countries. The list of registered organizations (in a voluntary manner) is available from the ‘users’ tab of the project's website [5]. Furthermore, more than 810,000 downloads have taken place from the project site. This software is also being distributed by many vendors as part of their software distributions. MVAPICH2 software is also powering many top supercomputers, including the 4th

Conclusion

In this paper, we discuss how high-impact computer science research can be translated into production-quality software for the community. The MVAPICH project, an academic project sustained during the last 19 years, has been successfully transformed into a production-quality high-performance MPI library for the HPC community. The project involves the standard research and development process, release cycle, participation in open source communities and deployment on production HPC clusters. As a

Authors’ contributions

Conception and design of study: D.K. Panda, H. Subramoni, C.-H. Chu, M. Bayatpour; acquisition of data: D.K. Panda, H. Subramoni, analysis and/or interpretation of data: C.-H. Chu, M. Bayatpour

Drafting the manuscript: C.-H. Chu, M. Bayatpour, H. Subramoni, D.K. Panda revising the manuscript critically for important intellectual content: C.-H. Chu, M. Bayatpour, H. Subramoni, D.K. Panda

Acknowledgment

This research is supported in part by NSF grants #1931537, #1450440, #1664137, #1818253, and XRAC grant #NCR-130002.

References (33)

H. Meuer et al.
TOP 500 Supercomputer Sites
(2020)
J. Dongarra et al.
Race to exascale
Comput. Sci. Eng.
(2019)
Message Passing Interface Forum
MPI: A Message-Passing Interface Standard
(1994)
InfiniBand Trade Association, http://www.infinibandta.com...
Network-Based Computing Laboratory
MVAPICH: MPI Over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE
(2001)
J. Liu et al.
High performance RDMA-based MPI implementation over InfiniBand
Int. J. Parallel Program.
(2004)
Network-Based Computing Laboratory
NOWLABL::Publications
(2001)
W. Huang et al.
Design of high performance MVAPICH2: MPI2 over InfiniBand
Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06), vol. 1
(2006)
J. Jose et al.
Unifying UPC and MPI runtimes: experience with MVAPICH
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
(2010)
H. Wang et al.
GPU-Aware MPI on RDMA-Enabled clusters: design, implementation and evaluation
IEEE Trans. Parallel Distrib. Syst.
(2014)

S. Chakraborty et al.

Designing scalable and high-performance MPI libraries on Amazon elastic fabric adapter

2019 IEEE Symposium on High-Performance Interconnects (HOTI)

(2019)

J.M. Hashmi et al.

Designing efficient shared address space reduction collectives for multi-/many-cores

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

(2018)

S. Chakraborty et al.

Contention-aware Kernel-assisted MPI collectives for multi-/many-core systems

2017 IEEE International Conference on Cluster Computing (CLUSTER)

(2017)

J.M. Hashmi et al.

FALCON: efficient designs for zero-copy MPI datatype processing on emerging architectures

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

(2019)

H. Subramoni et al.

Designing dynamic and adaptive MPI point-to-point communication protocols for efficient overlap of computation and communication

High Performance Computing

(2017)

H. Subramoni et al.

Designing MPI library with dynamic connected transport (DCT)of InfiniBand: early experiences

Supercomputing

(2014)

Cited by (45)

A spray painting simulation using high-speed rotary Atomizer—Model development and comparison of LES and RANS—
2024, Results in Engineering
This study performed a spray-painting simulation based on the Large Eddy Simulation (LES) and Reynolds-Averaged Navier-Stokes (RANS) for spray painting using a rotary atomizer. The LES better reproduced the flow field and spray dispersion between the bell and target than the RANS. Also, the region called “over-spray” in the radially outside of the bell, in which the droplets stay, was formed, and the residence time became longer for droplets with larger diameter regardless of numerical methods for turbulent flow. Further, the transfer efficiency (TE) in the RANS was almost constant, independent of droplet diameter because all the eddy was modeled in the RANS, and the turbulent fluctuation a droplet received near the target was isotropic. In contrast, the TE in the LES varied depending on droplet diameter and showed the maximum value because only small eddies are modeled in the LES, and the turbulent fluctuation was more realistic.
Understanding hot interconnects with an extensive benchmark survey
2022, BenchCouncil Transactions on Benchmarks, Standards and Evaluations
Understanding the designs and performance characterizations of hot interconnects on modern data center and high-performance computing (HPC) clusters is a fruitful research topic in recent years. The rapid and continuous growth of high-bandwidth and low-latency communication requirements for various types of data center and HPC applications (such as big data, deep learning, and microservices) has been pushing the envelope of advanced interconnect designs. We believe this is high time to investigate the performance characterizations of representative hot interconnects with different benchmarks. Hence, this paper presents an extensive survey of state-of-the-art hot interconnects on data center and HPC clusters and the associated representative benchmarks to help the community to better understand modern interconnects. In addition, we characterize these interconnects by the related benchmarks under different application scenarios. We provide our perspectives on benchmarking data center interconnects based on our survey, experiments, and results.
Cloud benchmarking and performance analysis of an HPC application in Amazon EC2
2024, Cluster Computing
ACCL+: an FPGA-Based Collective Engine for Distributed Applications
2023, arXiv
Implementation-Oblivious Transparent Checkpoint-Restart for MPI
2023, ACM International Conference Proceeding Series
Toward Sustainable HPC: Carbon Footprint Estimation and Environmental Implications of HPC Systems
2023, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023

View all citing articles on Scopus

Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3100 organizations worldwide (in 89 countries). As of August’20, more than 810,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 8th, 12th, 18th, and 19th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group (http://hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 330 organizations in 36 countries. As of August’20, more than 37,100 downloads of these libraries have taken place. MPI-driven approaches to achieve high-performance and scalable versions of Deep Learning frameworks (TensorFlow, PyTorch and MXNet) are available from https://hidl.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/∼panda.

Hari Subramoni received the Ph.D. degree in Computer Science from The Ohio State University, Columbus, OH, in 2013. He is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, and cloud computing. He has published over 50 papers in international journals and conferences related to these research areas. Recently, Dr. Subramoni is doing research and working on the design and development of MVAPICH2, MVAPICH2-GDR, and MVAPICH2-X software packages. He is a member of IEEE. More details about Dr. Subramoni are available from: http://www.cse.ohio-state.edu/∼subramon.

Ching-Hsiang Chu received the Ph.D. degree in Computer Science and Engineering from The Ohio State University, Columbus, Ohio, USA. He received the BS and MS degrees in computer science and information engineering from the National Changhua University of Education, Taiwan, and the National Central University, Taiwan, respectively. His research interests include High-Performance Computing, GPU communication, and wireless networks. He has authored or co-authored over 30 papers in conferences and journals related to these research areas. More details about Dr. Chu are available from: https://kingchc.gitlab.io.

Mohammadreza Bayatpour is a Ph.D. candidate in Computer Science and Engineering from The Ohio State University, Columbus, Ohio, USA. He received the BS degrees in computer engineering from the Sharif University of Technology in Tehran, Iran, and joined the Network-Based Computing Lab at The Ohio State University in 2015. His research interests include High-Performance Computing and parallel computer architecture. He has authored or co-authored over 15 papers in conferences and journals related to these research areas. More details are available from http://web.cse.ohio-state.edu/∼bayatpour.1/.

View full text

The MVAPICH project: Transforming research into high-performance MPI library for HPC community

Highlights

Abstract

Introduction

Section snippets

Overview of the MVAPICH project and its evolution

Research innovation of MVAPICH project for HPC

Translation process

Impact and lessons learned

Conclusion

Authors’ contributions

Acknowledgment

TOP 500 Supercomputer Sites

Race to exascale

Comput. Sci. Eng.

MPI: A Message-Passing Interface Standard

MVAPICH: MPI Over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE

High performance RDMA-based MPI implementation over InfiniBand

Int. J. Parallel Program.

NOWLABL::Publications

Design of high performance MVAPICH2: MPI2 over InfiniBand

Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06), vol. 1

Unifying UPC and MPI runtimes: experience with MVAPICH

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

GPU-Aware MPI on RDMA-Enabled clusters: design, implementation and evaluation

IEEE Trans. Parallel Distrib. Syst.