Generating high-resolution daily soil moisture by using spatial downscaling techniques: a comparison of six machine learning algorithms

https://doi.org/10.1016/j.advwatres.2020.103601Get rights and content

Highlights

  • Random forest algorithm achieved superior performance in soil moisture downscaling with high accuracy and robustness.

  • Regions located in one single climate zone which had mild topography variation and medium vegetation coverage tended to produce downscaled soil moisture with higher accuracy.

  • DEM, daytime LST, and NDVI were dominant parameters in soil moisture simulation.

Abstract

Tremendous efforts have been made for obtaining surface soil moisture (SM) at high spatial resolutions from microwave-based products via spatial downscaling. In recent years, machine learning has been one of the most advanced techniques in SM spatial downscaling. The performance of a machine learning technique in SM spatial downscaling varies with the algorithm and the underlying surface; however, despite the importance of machine learning for SM downscaling, there are still only few inter-comparisons, particularly over different surfaces. In this study, the performance of multiple machine learning algorithms in downscaling the ECV (the Essential Climate Variable Program initiated by the European Space Agency) SM dataset was validated over different underlying surfaces. Six machine learning algorithms: artificial neural network (ANN), Bayesian (BAYE), classification and regression trees (CART), K nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), were implemented to establish the spatial downscaling models with reliable continuous in-situ SM observations over four case study areas, including the Okalahoma Mesonet (OKM) in North America, Naqu network (NAN) in the Tibetan Plateau, REMEDHUS (REM) network in northeast Spain, and OZNNET (OZN) in southeast Australia. The land surface temperature (LST), normalized difference vegetation index (NDVI), albedo, digital elevation model (DEM), and geographic coordinates were the explanatory variables, and their contributions to the downscaling models over different surfaces were quantified. The conclusions of the experiments can be summarized as follows: (1) The RF achieved excellent performance with a high correlation coefficient and a low regression error. The BAYE and KNN also demonstrated favorable capabilities for SM downscaling; however, the robustness of their algorithms needed further improvements. Numerous abnormal values were obtained in the scale-down process by the ANN, CART, and SVM methods, suggesting their comparative inadequacy in SM downscaling. (2) Downscaled 1-km resolution SM in REM generally presented a close correlation with the in-situ measurements, and its bias was larger than that in the other three regions. Comparatively, the smallest bias with the second highest correlation was found in the OZN region. It was primarily deduced that regions that located in one single climate zone and had mild topography variation and medium vegetation coverage tended to produce high-accuracy results. (3) The feature importance index (FII) calculated by the RF model revealed that the DEM, daytime LST, and NDVI were dominant during reconstruction, particularly DEM in a study region with a large height difference. The specific FII of each independent variable varied remarkably across the different case study areas, probably owing to the complex hydrothermal as well as physical geography conditions. The results of this study demonstrate that the RF model outperforms the other models considered herein; furthermore, the effect of the FII of the variables over different underlying surfaces was demonstrated.

Introduction

The surface soil moisture (SM) of the Earth continuously participates in the hydrological circulation and heat flux exchange (André et al., 1986), making it a critical topic in crop growth monitoring (Clevers and Leeuwen, 1996), agricultural yield estimation (PRASAD et al., 2006), catchment hydrology processes (Western et al., 2004), and climate change studies (Seneviratne et al., 2010). In view of its significance, numerous SM data collections have emerged continually to serve different research needs (Reichle et al., 2009; Dorigo et al., 2011; Wagner et al., 2013b; Chan et al., 2016; Dorigo et al., 2017; Matei et al., 2017a; Matei et al., 2017b; Bindlish et al., 2018). In general, high accuracy, continuous time series, and complete spatial coverage are the common requirements for an ideal SM product.

In the 1950s, the Soviet Union and Mongolia started to record ground SM using monitoring sensors, such as coring devices (e.g., auger), to determine the national soil water content (Walker and Houser, 2001; Sheffield and Wood, 2006). Ground observations had a considerable advantage in that they can provide precise SM measurements at a specific time, site, and depth via sensor networks. However, it was barely possible to draw nationwide SM maps by referring to only dozens of ground stations, because there existed an enormous scale gap between point and regional SM values. Concurrently, the non-uniformity in the sensor distribution, inconsistency in the monitoring period, and differences in the sensor types further hindered acquiring qualified SM maps by spatial point-based interpolation approaches. To fundamentally resolve this problem, there was an attempt to use a microwave radiometer and a radar onboard a satellite to retrieve large-scale surface SM, allowing obtaining the daily SM condition across the globe (Vyas et al., 1985). Subsequently, gradually, numerous Earth observation programs were established, and correspondingly, SM products emerged (Njoku et al., 2003; Rosenqvist et al., 2007; Entekhabi et al., 2010; Chan et al., 2016).

It could be well accepted that each satellite-derived SM product possesses unique characteristics, which were illustrated by its specific retrieving band, inversion algorithm, spatial–temporal resolution, monitoring length, and null value strip (Angiulli et al., 2004). Considering the factual applicability heterogeneity in different SM products, the European Space Agency (ESA) initiated the Essential Climate Variable Soil Moisture (ECV SM) Program in 2010, aiming to develop a global, daily, and long-time series SM product with a satisfactory performance by integrating multi-source satellite-based SM products. Finally, the ECV SM product was built by combining three active microwave SM products (ERS-1/2 (Attema et al., 1998), MetOp-A ASCAT (Bartalis et al., 2007), and MetOp-B ASCAT (Wagner et al., 2013a)) and seven passive microwave SM products (SMMR (Paloscia et al., 2001), SSM/I (Ridder 2003), TRMM (Drusch et al., 2005), AMSR-E (Lacava et al., 2012), AMSR2 (Parinussa et al., 2013), Windsat (Li et al., 2010), and SMOS (Kerr et al., 2012)).

The ECV SM, combined by various single-senor microwave SM products, has received extensive attentions since its inception (Dorigo et al., 2017). A series of evaluations focusing on the ECV SM were conducted on regional, national, continental, and global scales (Dorigo et al., 2015b; An et al., 2016; Mcnally et al., 2016; Wang et al., 2016; Liu et al., 2018b). Results revealed that the ECV SM achieved outstanding uniformity in ground observations but varied strongly across networks. Further researches demonstrated that its low pixel resolution (0.25°) hindered its application on the watershed scale (Peng et al., 2016; Liu et al., 2017a; Peng et al., 2017). Moreover, to analyze regional hydrologic phenomena, such as drought, flood, and irrigation, it might be necessary to utilize fine-resolution (i.e. 1-km) SM data to embody the detailed SM texture change.

To improve the spatial resolution of satellite retrieved SM products, considerable research has been conducted to develop SM downscaling techniques (Srivastava et al., 2013; Jing et al., 2018). Among them machine learning (ML) techniques have received most attention in SM assimilation, owing to their flexibility and capability to deal with massive remote-sensed data and non-linear problems (Ali et al., 2015). Comprising of the probability theory, statistics, the approximation theory, and a complex algorithm, ML is a multi-disciplinary technique (Bishop 2006). Further, it has been widely utilized to accurately simulate spatial geoscience processes (Lary et al., 2016; Araya and Ghezzehei 2019). In terms of its utilization in SM studies, Liu et al. (2017) compared different ML approaches: classification and regression trees (CART), K nearest neighbor (KNN), Bayesian (BAYE), and random forest (RF) techniques, for obtaining the monthly ECV SM over northeast China. The results showed that the RF-downscaled SM demonstrated superior matching performance to the in-situ measurements compared to the other methods and could positively respond to precipitation variation (Liu et al., 2017a). Srivastava et al. (2013) attempted to downscale the SMOS (Soil Moisture and Ocean Salinity) SM by adopting artificial neural network (ANN) and support vector machine (SVM) approaches with the moderate-resolution imaging spectroradiometer (MODIS) land surface temperature (LST) in southwest England. They successfully proved the effectiveness of artificial intelligence (AI) in satellite-based SM downscaling (Srivastava et al., 2013). Im et al. (2016) conducted a case study in South Korea and Australia, downscaling the AMSR-E (Advanced Microwave Scanning Radiometer-EOS) SM using ML-based approaches, and achieved high-level coefficients against ground observations (Im et al., 2016). Being a subcategory of AI, ML has proved its advantages in nonparametric regression in geoscience and remote sensing fields (Lary et al., 2016). Besides SM, the above ML techniques can also be reliable methods to improve estimations in hydrology, meteorology, and their related fields (Wu and Chau 2011; Han and Cluckie 2015; Jing et al., 2016a; Jing et al., 2016b; Anton et al., 2019a).

As demonstrated by the above research results, the ML algorithm family has been widely applied in high-resolution SM generation driven by multi-source data fusion. The performances of ML approaches in SM spatial downscaling varied with the algorithm and the underlying surface; however, despite the importance of their application in SM downscaling, there were only few inter-comparisons, particularly over different surfaces. Hence, the main objective of our research was to evaluate the capabilities of the ANN, BAYE, CART, KNN, RF, and SVM techniques in the ECV SM downscaling application. This was achieved using MODIS products over different continental case study areas with diverse land surface parameter features and continuous stable ground observations.

Section snippets

Study areas

This research attempted to explore the performances of several ML techniques in generating high-resolution SM for multiple underlying surfaces. We selected four regions with stable long-time series data and representative physical geography background as the typical areas to conduct the research. They were located in the south of the high plains of the Mississippi River Basin (called the Okalahoma Mesonet (OKM)), northern part of the Iberian Peninsula (called REMEDHUS (REM)), southeastern

Generating high spatial resolution soil moiture based on a downscaling approach

The major steps of the approach for downscaling soil moisture data are shown as follows, and the specific experimental process is shown in Fig. 2:

  • (1)

    First, the collected explanatory variables were classified into dynamic variables and steady-state variables. As shown in Fig. 2, dynamic variables were composed of albedo, NDVI, LST_D, LST_N, and ∆LST, and steady-state variables included DEM, longitude, and latitude. The daily changed dynamic data and steady-state parameters were taken together to

Reconstruction comparison

Firstly, regression models were established on the original ECV SM scale (0.25° grid resolution). Further, the adopted approaches recontructed the SM on a 0.25° scale. The reconstruction capacities of the six regression models were preliminarily evaluated across different regions. The experiment first applied the regression model to reconstruct the ECV SM without a scale transition. Scatterplots and regression correlations of each ML technique applied in the four selected regions are shown in

Discussions

AI-based ML approaches have been adopted to solve regression and prediction problems in the geoscience field since their inception. Numerous studies have focused on applying ML techniques to acquire downscaled SM with one case study area (Abbaszadeh et al., 2019; Lee et al., 2019). Our comparative research evaluated the performances of various ML techniques in SM downscaling. In addition, four case study areas, which had quite different underlying surfaces and hydrothermal combination

Conclusions

In recent years, tremendous efforts have been made to acquire high-precision and resolution SM products across the world. Further, ML techniques have revealed unique characteristics compared to statistical and physical models in scaling conversion (Ge et al., 2019). This study systematically compared the performances of ML techniques in SM downscaling over four continental case study areas. The applicability of algorithms, regional heterogeneities, and variables importance were considered

Declaration of Competing Interest

None.

Acknowledgments

This research was jointly supported by GDAS' Project of Science and Technology Development (2020GDASYL-20200103006, 2016GDASRC-0211, 2017GDASCX-0101, 0601, 0403 and 0801; 2018GDASCX-0101, 0403; 2019GDASYL-0502001, 0301001, 0302001, 0501001, and 0401001), the National Natural Science Foundation of China (Grant 41801362, 41976190), the Guangdong Innovative and Entrepreneurial Research Team Program (2016ZT06D336), Guangzhou Science and Technology Project (201902010033), the Natural Science

References (138)

  • T. Farrar et al.

    The influence of soil type on the relationships between NDVI, rainfall, and soil moisture in semiarid Botswana. II. NDVI response to soil oisture

    Remote Sens. Environ.

    (1994)
  • T.J. Farrar et al.

    The influence of soil type on the relationships between NDVI, rainfall, and soil moisture in semiarid Botswana. II. NDVI response to soil moisture

    Remote Sens. Environ.

    (1994)
  • Y. Ge et al.

    Principles and Methods of Scaling Geospatial Earth Science Data

    (2019)
  • B. Ghanbarian et al.

    Sample dimensions effect on prediction of soil water retention curve and saturated hydraulic conductivity

    J. Hydrol. (Amst.)

    (2015)
  • M.A. Giraldo et al.

    Ground and surface temperature variability for remote sensing of soil moisture in a heterogeneous landscape

    J. Hydrol. (Amst.)

    (2009)
  • J.P. Guerschman et al.

    Scaling of potential evapotranspiration with MODIS data reproduces flux observations and catchment water balance observations across Australia

    J Hydrol (Amst)

    (2009)
  • C. Huang et al.

    Retrieving soil temperature profile by assimilating MODIS LST products with ensemble Kalman filter

    Remote Sens. Environ.

    (2008)
  • D.J. Lary et al.

    Machine learning in geosciences and remote sensing

    Geosc. Front.

    (2016)
  • Y. Liu et al.

    Comparison of different machine learning approaches for monthly satellite-based soil moisture downscaling over Northeast China

    Remote Sens. (Basel)

    (2017)
  • Y. Liu et al.

    Comparison of different machine learning approaches for monthly satellite-based soil moisture downscaling over Northeast China

    Remote Sens. (Basel)

    (2018)
  • Y. Liu et al.

    Comparison of different machine learning approaches for monthly satellite-based soil moisture downscaling over Northeast China

    Remote Sens. (Basel)

    (2017)
  • H. Ma et al.

    Satellite surface soil moisture from SMAP, SMOS, AMSR2 and ESA CCI: a comprehensive assessment using global ground-based observations

    Remote Sens. Environ.

    (2019)
  • D.B. Madsen

    Conceptualizing the Tibetan Plateau: environmental constraints on the peopling of the “Third Pole”

    Archaeol. Res. Asia

    (2016)
  • O. Matei et al.

    A data mining system for real time soil moisture prediction

    Proc. Eng.

    (2017)
  • A. Mcnally et al.

    Evaluating ESA CCI soil moisture in East Africa

    Int. J. Appl. Earth Observ. Geoinformation

    (2016)
  • S. Park et al.

    Drought monitoring using high resolution soil moisture through multi-sensor satellite data fusion over the Korean peninsula

    Agric. For. Meteorol.

    (2017)
  • P. Abbaszadeh et al.

    Downscaling SMAP radiometer soil moisture over the CONUS using an ensemble learning method

    Water Resour. Res.

    (2019)
  • Al, R.L.E.E. (2007). The Oklahoma Mesonet: a Multi-Purpose Network for Water Resources Monitoring and...
  • I. Ali et al.

    Review of machine learning approaches for biomass and soil moisture retrievals from remote sensing data

    Remote Sens (Basel)

    (2015)
  • N.S. Altman

    An Introduction to Kernel and nearest-neighbor nonparametric regression

    Am. Stat.

    (1992)
  • J.C. André et al.

    HAPEX—MOBLIHY: a hydrologic atmospheric experiment for the study of water budget and evaporation flux at the climatic scale

    Bull. Am. Meteorol. Soc.

    (1986)
  • M. Angiulli et al.

    L-band active-passive and L-C-X-bands passive data for soil moisture retrieval, two different approaches in comparison

  • C.A. Anton et al.

    Performance analysis of collaborative data mining vs context aware data mining in a practical scenario for predicting air humidity.

  • C.A. Anton et al.

    Collaborative data mining in agriculture for prediction of soil moisture and temperature

  • S.N. Araya et al.

    Using machine learning for prediction of saturated hydraulic conductivity and its sensitivity to soil structural perturbations

    Water Resour. Res.

    (2019)
  • E.P.W. Attema et al.

    ERS-1/2 SAR land applications: overview and main results

  • A. Avram et al.

    Context-Aware Data Mining vs Classical Data Mining: case Study on Predicting Soil Moisture

    Soft Comput

    (2019)
  • L. Bai et al.

    Estimation of surface soil moisture with downscaled land surface temperatures using a data fusion approach for heterogeneous agricultural land

    Water Resour. Res.

    (2019)
  • Z. Bartalis et al.

    Initial soil moisture retrievals from the METOP-A advanced scatterometer (ASCAT)

    Geophys Res Lett

    (2007)
  • J. Bian et al.

    Reconstruction of NDVI time-series datasets of MODIS based on Savitzky-Golay filter

    J. Remote Sens.

    (2010)
  • R. Bindlish et al.

    GCOM-W AMSR2 soil moisture product validation using core validation sites

    IEEE J. Selected Top. Appl. Earth Observ. Remote Sens.

    (2018)
  • C.M. Bishop

    Pattern recognition and machine learning

    Publ. Am. Stat. Assoc.

    (2006)
  • L.I. Breiman et al.

    Classification and regression trees (CART)

    Biometrics

    (1984)
  • F.V. Brock et al.

    The Oklahoma Mesonet: a technical overview

    J. Atmos. Oceanic Technol.

    (1995)
  • F.V. Brock et al.

    The Oklahoma Mesonet: a technical overview

    J. Atmos. Oceanic Technol.

    (2018)
  • T. Chai et al.

    Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature

    Geosci. Model Dev.

    (2014)
  • S.K. Chan et al.

    Assessment of the SMAP passive soil moisture product

    IEEE Trans. Geosci. Remote Sens.

    (2016)
  • Chen, C. and L. Breiman (2004). Using Random Forest to Learn Imbalanced...
  • Z.G. Chen et al.

    Remote sensing image merging based on Savitzky-Golay method

    Geogr. Geo-Information Sci.

    (2011)
  • G.W. Cheung et al.

    Evaluating goodness-of-fit indexes for testing measurement invariance

    Struct. Equation Model.

    (2002)
  • Cited by (71)

    View all citing articles on Scopus
    View full text