Efficient OpenMP parallelization to a complex MPI parallel magnetohydrodynamics code
Introduction
The Block-Adaptive-Tree Solarwind Roe Upwind Scheme (BATS-R-US) [7], [19] is a multi-physics MHD code written in Fortran 90 that has been actively developing at the University of Michigan for over 20 years. It is the most complex and often the computationally most expensive model in the Space Weather Modeling Framework (SWMF) [1], [29], [30] that has been applied to simulate multi-scale space physical systems including, but not limited to, the solar corona, the heliosphere, planetary magnetospheres, moons, comets and the outer heliosphere. For the purpose of adaptive mesh refinement (AMR) and running efficiency, the code was designed from the very beginning to use a 3D Cartesian block-adaptive mesh with MPI parallelization [7], [26]. In 2012, the original block-adaptive implementation has been replaced with the newly designed and implemented Block Adaptive Tree Library (BATL) [29] for creating, adapting, load-balancing and message-passing a 1, 2, or 3 dimensional block-adaptive grid in generalized coordinates. The major advantages of adaptive block approach include locally structured grid in each block, cache optimizations due to relatively small arrays associated with the grid blocks, loop optimization for fixed sized loops over cells in the block, and simple load balancing. Larger blocks reduce the total number of ghost cells surrounding the grid blocks, but make the grid adaptivity less economic. Smaller blocks allow precise grid adaptation, but require a large number of blocks and more storage and computation spent on ghost cells. The typical choice of block size in 3D ranges between to grid cells, with an additional 1–3 layers of ghost cells on each side depending on the order of the numerical scheme.
BATS-R-US has been gradually evolving into a comprehensive code by adding new schemes as well as new physical models. Currently, 60 equation sets from ideal hydrodynamics to the most recent six-moment fluid model [11] are available. The most important applications solve various forms of the magnetohydrodynamic (MHD) equations, including resistive, Hall, semi-relativistic, multi-species and multi-fluid MHD, optionally with anisotropic pressure, radiative transport and heat conduction. There are several choices of numerical schemes for the Riemann solvers, from the original Roe scheme to many others combined with a second order total variation diminishing (TVD) scheme or a fifth order accurate conservative finite difference scheme [5]. The time discretization can be explicit, point-implicit, semi-implicit, explicit/implicit or fully implicit. A high level abstraction of the code structure is presented in Fig. 1.
A powerful feature of BATS-R-US is the incorporation of user modules. This is an interface for users to modify literally any part of the kernel code without interfering with other modules. It provides a neat and easy way to gain high-level control of the simulations. Currently there are 51 different user modules in the repository, mostly used for the setup of specific initial and boundary conditions and additional user-defined source terms for the specific applications.
One of the key features of BATS-R-US is its excellent scalability on supercomputers. Previous benchmarks [29] with pure MPI parallelization have shown good strong scaling up to 8192 cores and weak scaling up to 16,384 cores within the memory limit of the testing platform. However, the grid-related pre-calculated information replicated on every MPI process for simplifying the refinement algorithm have generated an unavoidable memory redundancy on computational nodes. To increase the scalability to even larger sizes, we need to reorganize the code and come up with a more advanced solution.
In this work, we have extended BATS-R-US with a hybrid MPI OpenMP parallelization that significantly mitigates the limitations due to available memory. The strategies and issues are described in the next two sections, followed by performance test results and discussions.
Section snippets
Hybrid parallelization strategy
BATS-R-US was originally designed for pure MPI parallelization and did not take advantage of the rapid development of shared-memory multi-threading programming starting from late 1990s [4], [6]. Even though MPI is generally observed to give better parallel scaling than OpenMP due to forced data locality, one obvious shortcoming of the pure MPI implementation is wasteful memory usage. In BATS-R-US, we support 1, 2, and 3 dimensional block-adaptive grids, where each block contains cells.
Overview
There are two high level goals while modifying and improving the code:
- 1.
Backward compatibility: the code should still work correctly and efficiently without the OpenMP compilation flag.
- 2.
Minimize work effort and code changes as much as possible.
BATS-R-US is able to solve the system of partial differential equations with a mixture of explicit and implicit timestepping blocks distributed among the MPI processes. We treat first the explicit and then the implicit modules and add OpenMP directives
Nightly tests
For a comprehensive quality check and verification of the SWMF and the physics models contained (including BATS-R-US), we have built an automated nightly test suite for testing the code with various setups on various platforms. The latest version of the code is checked out from a central Git repository and 100 tests are performed on multiple platforms with different compilers, compiler flags and number of cores. The test results are monitored every day and have been archived since 2009. These
Performance
We have performed some standard Brio–Wu MHD shock tube tests on various platforms. A 3D Cartesian grid is chosen, and dynamic AMR has not been employed. The magnetic monopole is controlled by hyperbolic cleaning [29] and 8-wave scheme [19]. Despite its simplicity, this is a fairly representative test for various applications in terms of computational cost per grid cell, as well as exercising the most important parts of the BATS-R-US code.
Conclusion
In this work, we have successfully extended our finite volume/difference MHD code BATS-R-US from pure MPI to MPI OpenMP hybrid implementation, with only modification to the lines of source code. Good weak scaling performances are obtained up to cores with explicit time stepping and up to cores with implicit time stepping. Using the hybrid parallelization, we are now able to solve problems more than an order of magnitude larger than before thanks to the usage
CRediT authorship contribution statement
Hongyang Zhou: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Visualization. Gábor Tóth: Methodology, Resources, Supervision, Funding acquisition.
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.jpdc.2020.02.004.
Acknowledgments
The authors are thankful for the useful comments and suggestions by the reviewers. This research was supported by NSF INSPIRE, United States grant number PHY-1513379. The computational resources were funded by Blue Waters GLCPC, United States and NSF Frontera, United States .
Hongyang Zhou, Research Assistant is a PhD student at University of Michigan, Ann Arbor. He has been working on the coupled MHD-EPIC simulation of Ganymede’s magnetosphere. He also has research interest in high performance plasma simulations with various techniques.
References (34)
- et al.
A fifth-order finite difference scheme for hyperbolic equations on block-adaptive curvilinear grids
J. Comput. Phys.
(2016) - et al.
Hyperbolic divergence cleaning for the mhd equations
J. Comput. Phys.
(2002) - et al.
High performance computing using mpi and openmp on multi-core parallel systems
Parallel Comput.
(2011) - et al.
A solution-adaptive upwind scheme for ideal magnetohydrodynamics
J. Comput. Phys.
(1999) Approximate riemann solvers, parameter vectors, and difference schemes
J. Comput. Phys.
(1981)- et al.
A parallel explicit/implicit time stepping scheme on block-adaptive grids
J. Comput. Phys.
(2006) - et al.
Adaptive numerical algorithms in space weather modeling
J. Comput. Phys.
(2012) Space weather modeling framework
(2010)Multithreaded flash
(2019)Porting with sgi’s mpi and intel openmp
(2019)
Parallel Programming in OpenMP
Openmp: An industry-standard api for shared-memory programming
Comput. Sci. Eng.
An adaptive mhd method for global space weather simulations
IEEE Trans. Plasma Sci.
Using memory performance to understand the mixed mpi/openmp model
Performance comparison of pure mpi vs hybrid mpi-openmp parallelization models on smp clusters
Cited by (6)
An MPI-based parallel genetic algorithm for multiple geographical feature label placement based on the hybrid of fixed-sliding models
2024, Geo-Spatial Information ScienceDevelopment of NCL equivalent serial and parallel python routines for meteorological data analysis
2022, International Journal of High Performance Computing ApplicationsParallel implementation of an iterative solver for atmospheric tomography
2021, Proceedings - 2021 21st International Conference on Computational Science and Its Applications, ICCSA 2021What sustained multi-disciplinary research can achieve: The space weather modeling framework
2021, Journal of Space Weather and Space ClimateContainer-based Automatic Packaging Technology for Complex System Simulation Application
2020, Xitong Fangzhen Xuebao / Journal of System Simulation
Hongyang Zhou, Research Assistant is a PhD student at University of Michigan, Ann Arbor. He has been working on the coupled MHD-EPIC simulation of Ganymede’s magnetosphere. He also has research interest in high performance plasma simulations with various techniques.
Dr. Gábor Tóth, Research Professor is an expert in algorithm and code development for space and plasma physics simulations. He has a leading role in the development of Space Weather Modeling Framework that can couple and execute about a dozen different space physics models modeling domains from the surface of the Sun to the upper atmosphere of the Earth. He is one of the main developers of the BATS-R-US code, a multi-physics and multi-application MHD code using block-adaptive grids. He has participated in designing the software architect for the Center for Radiative Shock Hydrodynamics. He also designed the Versatile Advection Code, a general purpose publicly available hydrodynamics and MHD code.