A scalable distributed parallel simulation tool for the SWAT model

https://doi.org/10.1016/j.envsoft.2021.105133Get rights and content

Highlights

  • A distributed parallel simulation tool for the SWAT model (Spark-SWAT) is developed based on Spark.

  • Spark-SWAT can accelerate single and iterative model simulations.

  • Spark-SWAT can effectively solve high-computational-demand problems of complex models.

Abstract

High-fidelity hydrological models are increasingly built and used to investigate the effects of management activities and climate change on water availability and quality for large areas with datasets of high spatial and temporal resolution. However, these advantages come at the price of greater computational demand and run time. This becomes challenging when modeling routines involve iterative model simulations. In this study, we proposed a generic scheme to reduce the Soil and Water Assessment Tool (SWAT) runtime by decomposing a watershed model into subbasin models and optimizing the subbasin model simulations based on a parallel approach. Based on this scheme, we implemented a generic tool named Spark-SWAT, which allows subbasin models to be simulated in parallel on a Spark computer cluster. We then evaluated Spark-SWAT with two sets of experiments to demonstrate the potential of Spark-SWAT to accelerate single and iterative model simulations. In each test set, Spark-SWAT was applied to simulate 12 synthetic hydrological models in parallel with different I/O (input/output) burdens and river network complexities in a Spark cluster with five virtual machines. The single model parallelization results showed that Spark-SWAT yielded a speedup value of 7.84 for the most complex model but was less effective with simple models. When applied to use cases with iterative model runs, Spark-SWAT yielded a speedup of 6.55–24.58 depending on the model complexity. These results indicate that the proposed scheme can effectively solve high-computational-demand problems of complex models. As a subbasin-level parallelization tool, Spark-SWAT can be very computationally frugal and useful in use cases in which the model input changes pertain to only a few subbasins because only the changed and downstream subbasins require new computations. Moreover, it is possible to apply this generic method to other subbasin-based hydrological models to alleviate I/O demands and optimize model computational performance.

Section snippets

Software availability

The source codes and tools developed in this research are freely available through the GNU general public license for the general public. They are hosted in GitHub and can be accessed through the following link: https://github.com/djzhang80/Spark-SWAT.

SWAT model

SWAT (Arnold and Fohrer, 2005; Arnold et al., 1998) is a semidistributed, watershed-scale hydrological model that was initially developed by the Agricultural Research Service of the United States Department of Agriculture to predict the impact of watershed management practices on water, sediments, nutrients, pesticides, and fecal bacterial yields in the agricultural landscapes of North America. Due to its distributed, physically based and open-access nature, it has been adapted and applied to

Test environment

A Spark cluster consisting of five virtual machines was established to test the performance of Spark-SWAT. These virtual machines are built on three physical servers by using VMware ESXi (version 4.1) software. There are 10 and 20 physical and logical cores for each physical server, respectively. The clock speed, random-access memory (RAM) and disk storage of each physical server are 2.2 GHz, 128 GB and 5 TB, respectively. Each of these physical servers runs VMware ESXi, which is an

Model result comparisons

In theory, the model outputs of undivided and split models should be identical. However, the model output comparison in this study shows that the stream flow, sediment and other chemicals are not identical (Fig. 6). Fig. 6a shows the simulated stream flows of the undivided and split models and errors between them (for clarity, only a half-year of simulated stream flow was plotted). As seen in this plot, for most cases, the simulated stream flows of these two models are identical; only a small

Conclusions

In this paper, we proposed a scheme for SWAT parallelization by splitting a watershed model into multiple subbasin models and orchestrating parallel simulations according to the watershed route network. We implemented a parallel-computing tool for SWAT (Spark-SWAT) by using an open-source general-purpose distributed cluster-computing framework (Spark) according to the proposed scheme. Based on synthetic models, Spark-SWAT was tested and evaluated with a small Spark cluster consisting of five

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was financially supported by the Natural Science Foundation of Fujian Province [grant number 2020J01779], the Science and Technology Project of Xiamen [grant number 3502Z20183056], and the Science and Technology Climbing Program of Xiamen University of Technology [grant number XPDKT19014].

References (40)

Cited by (0)

1

Note: Lin and Zhang contributed equally to this work.

View full text