Multi-sensor fusion using random forests for daily fractional snow cover at 30 m

https://doi.org/10.1016/j.rse.2021.112608Get rights and content

Highlights

  • Data from two satellites were fused for daily 30 m fractional snow cover (fSCA)

  • Satellite data was combined using a two-stage random forest algorithm

  • fSCA from MODIS is the most important variable for classification and prediction

  • Fusion has an accuracy of 97% with similar performance throughout the year

Abstract

In addition to providing water for nearly 2 billion people, snow drives resource selection by wildlife and influences the behavior and demography of many species. Because snow cover is highly spatially and temporally variable, mapping its extent using currently available satellite data remains a challenge. At present, there are no sensors acquiring daily data of Earth's entire surface at fine spatial resolutions (< 30 m) in wavelengths required for snow cover retrieval, namely: visible, near-infrared, and shortwave infrared. Fine scale observations at 30 m from Landsat are available at 16-day intervals since 1982 and at 8-day intervals since 1999. However, over this duration, snow can accumulate, ablate, or both, making the Landsat data ineffective for many applications. Conversely, the Moderate Resolution Imaging Spectroradiometer (MODIS) atmospherically corrected daily reflectance data, have a coarse spatial resolution of 463 m and thus, are not ideal for snow cover mapping either. This spatial and temporal resolution tradeoff limits the use of these data for a wide range of snow cover applications and indicates a pressing need for data fusion. To address this need, we use a physically-based, spectral-mixture-analysis approach for mapping fractional snow cover (fSCA) and a two-stage random forest algorithm to produce daily 30 m fSCA. We test our algorithm in the US Sierra Nevada and find MODIS fSCA is the most important predictor. We cross validate using 170 Landsat scenes and while snow cover varies immensely in time we find little variation in errors between seasons, a small bias of 0.01, and an overall accuracy of 0.97 with slightly higher precision than recall. This technique for accurate, daily, high-resolution snow cover retrievals could be applied more broadly for analyses of regional energy budget, validating snow cover in global and regional models, and for quantifying changes in the availability of biotic resources in ecosystems.

Introduction

One quarter to one sixth of the global population depends on water from snow and ice melt (Barnett et al., 2005; Mankin et al., 2015). Snow is a natural reservoir of water, but it has a complex spatial and temporal distribution that make accurate creation of snow cover maps and the estimation of snow water equivalent (SWE) difficult. These difficulties in snow estimation manifest in runoff forecast errors and can have significant impacts on water managers (Stillinger et al., 2021), agricultural users, and others who depend on accurate water allocation estimates.

In addition to its impact on water resources, snow influences the survival of mountain ungulates and profoundly affects the energetic costs of living in colder environments (Conner et al., 2018; Stephenson et al., 2020). In mountainous or northern climates, snow determines the availability of forage and the energy that must be expended to acquire food for much of the year (Parker et al., 1984; Schwab and Pitt, 1987). For example, bighorn sheep in alpine environments exhibit multiple migratory behaviors including alpine residency during winter (Spitz et al., 2018) and select habitat in response to snowscapes. Finer scale snow metrics improve predictive models of the movement and resource selection of these animals (Mahoney et al., 2018). Consequently, snow cover is an essential parameter in habitat models designed to predict resource selection (Spitz et al., 2020).

To better understand these impacts of snow cover on both humans and wildlife, we need to look at the challenges surrounding the satellite retrieval of fractional snow cover (fSCA) and related SWE estimates. To calculate SWE, accurate estimates of fSCA are needed, e.g. in energy balance models (Bair et al., 2016; Rittger et al., 2016). In a machine learning approach to predicting SWE, fSCA was the most important predictor (Bair et al., 2018a). However, accurate estimates of fSCA are difficult to calculate at both a high spatial and temporal resolution. This is especially true in mountainous terrain where the spatial variability of snow caused by wind redistribution (Liston and Sturm, 1998) and other factors can be substantial. For example, Airborne Snow Observatory (ASO, Painter et al., 2016) retrievals that use a combination of lidar and hyperspectral sensors to estimate SWE at 50 m resolution routinely produce adjacent pixels with an order of magnitude difference in SWE. Likewise, bare and snow-covered areas are often found within a few meters of each other because of differences in topography or vegetation (Rosenthal and Dozier, 1996; Selkowitz et al., 2014). To further complicate matters, areal snow cover can change on the order of hours, especially during melt periods where snow can be present in the morning and gone by afternoon. Rapid temporal changes influence accurate modeling of snowmelt, requiring imagery with a temporal resolution of at least one day (Slater et al., 2013). SWE models forced using 30 m fSCA data have shown more accurate results compared to those models forced with 500 m fSCA (Molotch and Margulis, 2008; Margulis et al., 2019).

Thus, for accurate spatial models of snow distribution and snowmelt, two competing requirements are needed: decameter scale spatial resolution and daily temporal resolution. No satellite or constellation of satellites currently satisfies these requirements. Three widely used sensors for remote sensing of snow are MODIS (daily, 500 m resolution), Landsat 5, and Landsat 7 (16-day for each or 8-day combined, 30 m resolution). More recently, data are available from Landsat 8 (16-day, 30 m resolution) and Sentinel 2a and 2b (10-day for each or 5-day combined, 10, 20, or 60 m resolution depending on band). Individually, none of these satellites satisfy our requirements, nor do they when combined together, though the scheduled launch of Landsat 9 in September 2021 will bring us one step closer.

The need for high-temporal and spatial-resolution snow cover products motivates sensor fusion, and there have been a few studies where fusion techniques have been applied to take advantage of both the spatial and temporal resolution, though many use binary snow cover for coarse resolution data, fine resolution data, or both. Baumgartner et al. (1987) compared Landsat Multispectral Scanner System with Advanced Very High Resolution Radiometer (AVHRR) imagery and suggested the two could be fused, but stopped short of creating a fused product. Durand et al. (2008) created a fused MODIS and Landsat product. The researchers combined fSCA derived from coarsening the MOD10A v.4 binary snow cover data (Hall et al., 2002) to 1 km with 30 m fractional snow cover estimated from Landsat 7 Enhanced Thematic Mapper+ (ETM+) reflectance data (Painter et al., 2003; Painter et al., 2009). The novel linear technique they used to combine the two data streams, when applied to the upper Rio Grande in Colorado (US) to constrain a SWE reconstruction model, resulted in a 51% reduction in mean absolute error (MAE) and a 49% reduction in bias for SWE. Likewise, (Durand et al., 2008) found that using ETM+ data upscaled to 100 m provided substantial improvement in SWE with 23% MAE compared to 50% MAE for the coarser MODIS data.

Like Durand et al. (2008), Berman et al. (2018) used MODIS and Landsat snow maps but took an alternate approach with a dynamic time warping technique to create daily 30 m snow cover estimates. Instead of binary snow cover, they used fractional snow cover from MOD10A1 (Salomonson and Appel, 2004). Berman et al. (2018) report a Root Mean Squared Error (RMSE) of 31% to 68% for fSCA, validated with ground measurements (snow pillows and time-lapse cameras). It is notable that the regions selected in that study are particularly challenging forest locations for optical remote sensing, with 60% to >80% canopy cover. High MODIS sensor view zenith angles that can completely obstruct snow cover in forests (Dozier et al., 2008; Rittger et al., 2020) were not accounted for, and only cloud-free (not gap-filled or smoothed) imagery (Hall et al., 2010) was used, thereby greatly reducing the effective temporal resolution for the inputs. Several studies have downscaled and compared snow cover between the products e.g., Landsat with MODIS snow cover data (Walters et al., 2014; Li et al., 2015) and find reasonable accuracy, but the maps are binary, which overestimate snow at high fractions and underestimate snow at low fractions (Rittger et al., 2013). A number of previous studies focusing on snow cover but not SWE have combined MODIS data and binary snow maps from Landsat at 30 m or unmanned aerial vehicles at similar or higher spatial resolution (Dobreva and Klein, 2011; Moosavi et al., 2014; Liang et al., 2017; Kuter et al., 2018; Liu et al., 2020; Kuter, 2021). The studies used varying methods such as linear regression, multivariate adaptive regression splines, neural networks, support vector machines and random forests, but focused on improving the accuracy of daily maps at 500 m rather than providing a higher spatial resolution daily product.

A goal of data fusion is that the fused product is more consistent or accurate than the individual parts. These earlier fusion methods used either the Normalized Difference Snow Index (NDSI) or binary versions of the MOD10A snow cover product. With improved fSCA retrieval algorithms for both MODIS and Landsat input data streams, along with appropriate treatment of data gaps, clouds, high sensor view angles, and saturation, an improved daily moderate-resolution snow cover fusion product may be realized. While the combination of newest generation satellites Landsat 8, Sentinel 2a, and Sentinel 2b provide observations of Earth's surface approximately every three days (Claverie et al., 2018), these new data do not address snow mapping in the historical context. In addition, as previously noted, snow can still accumulate, melt, or both within three days' period supporting the need for fusion in both the historical and future contexts. Analysis comparing the 30 m snow maps at the three-day interval to MODIS snow maps will inform our understanding of the various methods later described in estimating daily snow cover accurately before the application of interpolation or fusion methods.

Compared to previous methods, our efforts described in this paper represent a step forward through the use of spectral mixture analysis for snow cover mapping at both the Landsat and MODIS resolution together with a new machine learning approach to combine them. We describe the study area in the Sierra Nevada USA, the satellite data that include observations from Landsat and MODIS, and algorithms that include spectral mixture analysis used to estimate fSCA in Section 2. Section 2 also describes the fusion methods. In particular, we use random forests, but introduce a novel two-stage conditional approach that generalizes the original random forest concept. Validation metrics follow, along with sampling in Section 3. Section 4 shows the results and discusses the training sample size analysis, variable importance, seasonal performance, and limitations of the approach. Section 5 concludes with a summary and future directions.

Section snippets

Data and methods

Our study region in the Sierra Nevada USA spans from the North Fork of the American River basin in the north to the South Fork of the Kern River basin in the south (Fig. 1). This area of the Sierra Nevada is characterized by high vertical relief (up to 3000 m), with areas on the west side of the crest receiving far more precipitation than those to the east. For example, SWE on the ground measured at 2940 m on Mammoth Mountain near the crest reaches an annual peak of 128 cm (Bair et al., 2018b),

Validation metrics and sampling

There are several predictor variables with missing (not-applicable) or null values and also Landsat images with null values over the study region, for example, in cloudy or deeply shaded areas and large water bodies. These pixels are removed from validation and sampling along with pixels saturated in all three bands as described in Section 2.1.2. Pixels saturated in all three bands would lead to an overestimate in snow cover (Rittger et al., 2021). To grow the random forests, we select a subset

Results and discussion

To illustrate the fSCA based on our approach, we consider two example days with distinct climatological characteristics. For the ensuing results, we fit the two-stage model and downscale on a cluster with 193 GB of RAM and 2× Intel Xeon Gold 6130, 2.1GHz 16-core Skylake processors. Fig. 4 shows data for January 25, 2006, and March 17, 2007, which were unusually dry and wet years respectively (Rittger et al., 2016). The MODIS fSCA used as a predictor is shown in Fig. 4a and d, downscaled

Conclusion

We used a scale-invariant, two-stage random forest model along with the best available fractional snow cover estimates from spectral mixture analysis to create daily snow cover maps at 30 m. Validation metrics show that, with adequate sampling over time and space, we can combine infrequent snow mapping from 30 m data with daily data at 500 m to produce accurate daily maps of snow at 30 m resolution. The model performed similarly in all seasons of the year and at different aspects with larger

Funding

California Department of Fish and Wildlife, NASA award 80NSSC18K1489; NASA award 80NSSC18K0427; NOAA award NA18OAR4590380, and University of California award LFR-18-54831.

Data statement

To satisfy NASA open data policies, all data are available in online repositories in GEOTIFF format from Landsat and MODIS snow cover, predictors from Table 1, and snow cover from the 2-stage random forest model. (Rittger, 2021)

The 2-stage random forest model is currently on a public repository in GitHub (Krock et al., 2019).

CRediT authorship contribution statement

Karl Rittger: Conceptualization, Methodology, Formal analysis, Data curation, Visualization, Writing – original draft, Writing – review & editing, Project administration, Funding acquisition. Mitchell Krock: Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing. William Kleiber: Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (60)

  • V. Moosavi et al.

    Fractional snow cover mapping from MODIS data using wavelet-artificial intelligence hybrid models

    J. Hydrol.

    (2014)
  • T.H. Painter et al.

    Retrieval of subpixel snow-covered area and grain size from imaging spectrometer data

    Remote Sens. Environ.

    (2003)
  • T.H. Painter et al.

    Retrieval of subpixel snow-covered area, grain size, and albedo from MODIS

    Remote Sens. Environ.

    (2009)
  • T.H. Painter et al.

    The airborne snow observatory: fusion of scanning lidar, imaging spectrometer, and physically-based modeling for mapping snow water equivalent and snow albedo

    Remote Sens. Environ.

    (2016)
  • M.S. Raleigh et al.

    Ground-based testing of MODIS fractional snow cover in subalpine meadows and forests of the Sierra Nevada

    Remote Sens. Environ.

    (2013)
  • K. Rittger et al.

    Assessment of methods for mapping snow cover from MODIS

    Adv. Water Resour.

    (2013)
  • K. Rittger et al.

    Spatial estimates of snow water equivalent from reconstruction

    Adv. Water Resour.

    (2016)
  • V.V. Salomonson et al.

    Estimating fractional snow cover from MODIS using the normalized difference snow index

    Remote Sens. Environ.

    (2004)
  • A.G. Slater et al.

    Uncertainty in seasonal snow reconstruction: relative impacts of model forcing and image availability

    Adv. Water Resour.

    (2013)
  • R.D. Walters et al.

    A physiographic approach to downscaling fractional snow cover data in mountainous regions

    Remote Sens. Environ.

    (2014)
  • E.H. Bair et al.

    Validating reconstruction of snow water equivalent in California’s Sierra Nevada using measurements from the NASA airborne snow observatory

    Water Resour. Res.

    (2016)
  • E.H. Bair et al.

    Using machine learning for real-time estimates of snow water equivalent in the watersheds of Afghanistan

    Cryosphere

    (2018)
  • E.H. Bair et al.

    Hourly mass and snow energy balance measurements from Mammoth Mountain, CA USA, 2011–2017

    Earth System Science Data

    (2018)
  • E.H. Bair et al.

    An examination of snow albedo estimates from MODIS and their impact on snow water equivalent reconstruction

    Water Resour. Res.

    (2019)
  • T.P. Barnett et al.

    Potential impacts of a warming climate on water availability in snow-dominated regions

    Nature

    (2005)
  • M.F. Baumgartner et al.

    Toward snowmelt runoff forecast based on multisensor remote-sensing Informnation

    IEEE Trans. Geosci. Remote Sens.

    (1987)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • M.M. Conner et al.

    Survival analysis: informing recovery of Sierra Nevada bighorn sheep

    J. Wildl. Manag.

    (2018)
  • B.A. Cosgrove et al.

    Real-time and retrospective forcing in the north American land data assimilation system (NLDAS) project

    J. Geophys. Res.-Atmos.

    (2003)
  • Department of the Interior, U.S.G.S

    Landsat Fractional Snow Covered Area (fSCA) Algorithm Description Document (ADD)

  • Cited by (0)

    View full text