Multi-sensor fusion using random forests for daily fractional snow cover at 30 m
Introduction
One quarter to one sixth of the global population depends on water from snow and ice melt (Barnett et al., 2005; Mankin et al., 2015). Snow is a natural reservoir of water, but it has a complex spatial and temporal distribution that make accurate creation of snow cover maps and the estimation of snow water equivalent (SWE) difficult. These difficulties in snow estimation manifest in runoff forecast errors and can have significant impacts on water managers (Stillinger et al., 2021), agricultural users, and others who depend on accurate water allocation estimates.
In addition to its impact on water resources, snow influences the survival of mountain ungulates and profoundly affects the energetic costs of living in colder environments (Conner et al., 2018; Stephenson et al., 2020). In mountainous or northern climates, snow determines the availability of forage and the energy that must be expended to acquire food for much of the year (Parker et al., 1984; Schwab and Pitt, 1987). For example, bighorn sheep in alpine environments exhibit multiple migratory behaviors including alpine residency during winter (Spitz et al., 2018) and select habitat in response to snowscapes. Finer scale snow metrics improve predictive models of the movement and resource selection of these animals (Mahoney et al., 2018). Consequently, snow cover is an essential parameter in habitat models designed to predict resource selection (Spitz et al., 2020).
To better understand these impacts of snow cover on both humans and wildlife, we need to look at the challenges surrounding the satellite retrieval of fractional snow cover (fSCA) and related SWE estimates. To calculate SWE, accurate estimates of fSCA are needed, e.g. in energy balance models (Bair et al., 2016; Rittger et al., 2016). In a machine learning approach to predicting SWE, fSCA was the most important predictor (Bair et al., 2018a). However, accurate estimates of fSCA are difficult to calculate at both a high spatial and temporal resolution. This is especially true in mountainous terrain where the spatial variability of snow caused by wind redistribution (Liston and Sturm, 1998) and other factors can be substantial. For example, Airborne Snow Observatory (ASO, Painter et al., 2016) retrievals that use a combination of lidar and hyperspectral sensors to estimate SWE at 50 m resolution routinely produce adjacent pixels with an order of magnitude difference in SWE. Likewise, bare and snow-covered areas are often found within a few meters of each other because of differences in topography or vegetation (Rosenthal and Dozier, 1996; Selkowitz et al., 2014). To further complicate matters, areal snow cover can change on the order of hours, especially during melt periods where snow can be present in the morning and gone by afternoon. Rapid temporal changes influence accurate modeling of snowmelt, requiring imagery with a temporal resolution of at least one day (Slater et al., 2013). SWE models forced using 30 m fSCA data have shown more accurate results compared to those models forced with 500 m fSCA (Molotch and Margulis, 2008; Margulis et al., 2019).
Thus, for accurate spatial models of snow distribution and snowmelt, two competing requirements are needed: decameter scale spatial resolution and daily temporal resolution. No satellite or constellation of satellites currently satisfies these requirements. Three widely used sensors for remote sensing of snow are MODIS (daily, 500 m resolution), Landsat 5, and Landsat 7 (16-day for each or 8-day combined, 30 m resolution). More recently, data are available from Landsat 8 (16-day, 30 m resolution) and Sentinel 2a and 2b (10-day for each or 5-day combined, 10, 20, or 60 m resolution depending on band). Individually, none of these satellites satisfy our requirements, nor do they when combined together, though the scheduled launch of Landsat 9 in September 2021 will bring us one step closer.
The need for high-temporal and spatial-resolution snow cover products motivates sensor fusion, and there have been a few studies where fusion techniques have been applied to take advantage of both the spatial and temporal resolution, though many use binary snow cover for coarse resolution data, fine resolution data, or both. Baumgartner et al. (1987) compared Landsat Multispectral Scanner System with Advanced Very High Resolution Radiometer (AVHRR) imagery and suggested the two could be fused, but stopped short of creating a fused product. Durand et al. (2008) created a fused MODIS and Landsat product. The researchers combined fSCA derived from coarsening the MOD10A v.4 binary snow cover data (Hall et al., 2002) to 1 km with 30 m fractional snow cover estimated from Landsat 7 Enhanced Thematic Mapper+ (ETM+) reflectance data (Painter et al., 2003; Painter et al., 2009). The novel linear technique they used to combine the two data streams, when applied to the upper Rio Grande in Colorado (US) to constrain a SWE reconstruction model, resulted in a 51% reduction in mean absolute error (MAE) and a 49% reduction in bias for SWE. Likewise, (Durand et al., 2008) found that using ETM+ data upscaled to 100 m provided substantial improvement in SWE with 23% MAE compared to 50% MAE for the coarser MODIS data.
Like Durand et al. (2008), Berman et al. (2018) used MODIS and Landsat snow maps but took an alternate approach with a dynamic time warping technique to create daily 30 m snow cover estimates. Instead of binary snow cover, they used fractional snow cover from MOD10A1 (Salomonson and Appel, 2004). Berman et al. (2018) report a Root Mean Squared Error (RMSE) of 31% to 68% for fSCA, validated with ground measurements (snow pillows and time-lapse cameras). It is notable that the regions selected in that study are particularly challenging forest locations for optical remote sensing, with 60% to >80% canopy cover. High MODIS sensor view zenith angles that can completely obstruct snow cover in forests (Dozier et al., 2008; Rittger et al., 2020) were not accounted for, and only cloud-free (not gap-filled or smoothed) imagery (Hall et al., 2010) was used, thereby greatly reducing the effective temporal resolution for the inputs. Several studies have downscaled and compared snow cover between the products e.g., Landsat with MODIS snow cover data (Walters et al., 2014; Li et al., 2015) and find reasonable accuracy, but the maps are binary, which overestimate snow at high fractions and underestimate snow at low fractions (Rittger et al., 2013). A number of previous studies focusing on snow cover but not SWE have combined MODIS data and binary snow maps from Landsat at 30 m or unmanned aerial vehicles at similar or higher spatial resolution (Dobreva and Klein, 2011; Moosavi et al., 2014; Liang et al., 2017; Kuter et al., 2018; Liu et al., 2020; Kuter, 2021). The studies used varying methods such as linear regression, multivariate adaptive regression splines, neural networks, support vector machines and random forests, but focused on improving the accuracy of daily maps at 500 m rather than providing a higher spatial resolution daily product.
A goal of data fusion is that the fused product is more consistent or accurate than the individual parts. These earlier fusion methods used either the Normalized Difference Snow Index (NDSI) or binary versions of the MOD10A snow cover product. With improved fSCA retrieval algorithms for both MODIS and Landsat input data streams, along with appropriate treatment of data gaps, clouds, high sensor view angles, and saturation, an improved daily moderate-resolution snow cover fusion product may be realized. While the combination of newest generation satellites Landsat 8, Sentinel 2a, and Sentinel 2b provide observations of Earth's surface approximately every three days (Claverie et al., 2018), these new data do not address snow mapping in the historical context. In addition, as previously noted, snow can still accumulate, melt, or both within three days' period supporting the need for fusion in both the historical and future contexts. Analysis comparing the 30 m snow maps at the three-day interval to MODIS snow maps will inform our understanding of the various methods later described in estimating daily snow cover accurately before the application of interpolation or fusion methods.
Compared to previous methods, our efforts described in this paper represent a step forward through the use of spectral mixture analysis for snow cover mapping at both the Landsat and MODIS resolution together with a new machine learning approach to combine them. We describe the study area in the Sierra Nevada USA, the satellite data that include observations from Landsat and MODIS, and algorithms that include spectral mixture analysis used to estimate fSCA in Section 2. Section 2 also describes the fusion methods. In particular, we use random forests, but introduce a novel two-stage conditional approach that generalizes the original random forest concept. Validation metrics follow, along with sampling in Section 3. Section 4 shows the results and discusses the training sample size analysis, variable importance, seasonal performance, and limitations of the approach. Section 5 concludes with a summary and future directions.
Section snippets
Data and methods
Our study region in the Sierra Nevada USA spans from the North Fork of the American River basin in the north to the South Fork of the Kern River basin in the south (Fig. 1). This area of the Sierra Nevada is characterized by high vertical relief (up to 3000 m), with areas on the west side of the crest receiving far more precipitation than those to the east. For example, SWE on the ground measured at 2940 m on Mammoth Mountain near the crest reaches an annual peak of 128 cm (Bair et al., 2018b),
Validation metrics and sampling
There are several predictor variables with missing (not-applicable) or null values and also Landsat images with null values over the study region, for example, in cloudy or deeply shaded areas and large water bodies. These pixels are removed from validation and sampling along with pixels saturated in all three bands as described in Section 2.1.2. Pixels saturated in all three bands would lead to an overestimate in snow cover (Rittger et al., 2021). To grow the random forests, we select a subset
Results and discussion
To illustrate the fSCA based on our approach, we consider two example days with distinct climatological characteristics. For the ensuing results, we fit the two-stage model and downscale on a cluster with 193 GB of RAM and 2× Intel Xeon Gold 6130, 2.1GHz 16-core Skylake processors. Fig. 4 shows data for January 25, 2006, and March 17, 2007, which were unusually dry and wet years respectively (Rittger et al., 2016). The MODIS fSCA used as a predictor is shown in Fig. 4a and d, downscaled
Conclusion
We used a scale-invariant, two-stage random forest model along with the best available fractional snow cover estimates from spectral mixture analysis to create daily snow cover maps at 30 m. Validation metrics show that, with adequate sampling over time and space, we can combine infrequent snow mapping from 30 m data with daily data at 500 m to produce accurate daily maps of snow at 30 m resolution. The model performed similarly in all seasons of the year and at different aspects with larger
Funding
California Department of Fish and Wildlife, NASA award 80NSSC18K1489; NASA award 80NSSC18K0427; NOAA award NA18OAR4590380, and University of California award LFR-18-54831.
Data statement
To satisfy NASA open data policies, all data are available in online repositories in GEOTIFF format from Landsat and MODIS snow cover, predictors from Table 1, and snow cover from the 2-stage random forest model. (Rittger, 2021)
The 2-stage random forest model is currently on a public repository in GitHub (Krock et al., 2019).
CRediT authorship contribution statement
Karl Rittger: Conceptualization, Methodology, Formal analysis, Data curation, Visualization, Writing – original draft, Writing – review & editing, Project administration, Funding acquisition. Mitchell Krock: Methodology, Software, Validation, Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing. William Kleiber: Methodology, Software, Validation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (60)
- et al.
Daily estimates of Landsat fractional snow cover driven by MODIS and dynamic time-warping
Remote Sens. Environ.
(2018) - et al.
The harmonized Landsat and Sentinel-2 surface reflectance data set
Remote Sens. Environ.
(2018) - et al.
Fractional snow cover mapping through artificial neural network analysis of MODIS surface reflectance
Remote Sens. Environ.
(2011) - et al.
Time-space continuity of daily maps of fractional snow cover and albedo from MODIS
Adv. Water Resour.
(2008) - et al.
Merging complementary remote sensing datasets in the context of snow water equivalent reconstruction
Remote Sens. Environ.
(2008) - et al.
MODIS snow-cover products
Remote Sens. Environ.
(2002) - et al.
Development and evaluation of a cloud-gap-filled MODIS daily snow-cover product
Remote Sens. Environ.
(2010) - et al.
Retrieval of fractional snow covered area from MODIS data by multivariate adaptive regression splines
Remote Sens. Environ.
(2018) - et al.
Fractional snow-cover mapping based on MODIS and UAV data over the Tibetan plateau
Remote Sens.
(2017) - et al.
Estimating the distribution of snow water equivalent using remotely sensed snow cover data and a spatially distributed snowmelt model: a multi-resolution, multi-sensor comparison
Adv. Water Resour.
(2008)