Calibrating SLEUTH with big data: Projecting California's land use to 2100
Introduction
Land use change is a principal driver of global change, and among all land use transitions, the spread of urban areas has a dominant negative impact on greenhouse gas emissions, loss of natural space, farmland and biodiversity, along with increased congestion and environmental pollution (Foley, DeFries, Asner, Barford, et al., 2005). The mapping, modeling, and forecasting of these changes are of critical concern for anticipating and mitigating these negative consequences. Among the most popular and successful land use change models are those based on cellular automata (Clarke, 2019). SLEUTH is a mature land use change and urban growth cellular automaton simulation model that ingests gridded data, and creates probabilistic simulations of future land use states (Chaudhuri & Clarke, 2013). The cellular automaton embeds rules governing urban growth based on spread rules, and class-to-class land use changes based on a Markovian transition matrix computed from past changes. Modeling consists of preparing the input data, testing the model code, using past data to calibrate the model's behavioral coefficients, and allowing the model to run into the future to create scenarios of growth and change. Traditionally, the calibration sequence consisted of a brute force method that adjusted the coefficients to best fit the prior data (Silva & Clarke, 2002). However, this brute force method is both labor and CPU intensive, which has proven to be a barrier to the model's application.
As such, the SLEUTH calibration process has been studied in detail (Clarke, 2008; Clarke, Hoppen, & Gaydos, 1996). Several studies explored the use of parallel and high-performance computing to decrease the calibration time (Chaudhuri & Foley, 2019; Guan & Clarke, 2010). Others investigated the sensitivity of the model to the number of Monte Carlo iterations used (Goldstein, Dietzel, & Clarke, 2005); the duration used for calibration and forecasting (Peiman & Clarke, 2014); the changes made by the self-modification rules (Saxena & Jat, 2018); the means of including past and future exclusions (Akin, Clarke, & Berberoglu, 2014; Onsted & Clarke, 2012); and the use of alternative goodness of fit measures, such as landscape metrics (Herold, Couclelis, & Clarke, 2005). Sakieh, Salmanmahiny, and Mirkarimi (2016) tested alternative models against SLEUTH, such as logistic regression and a multi-layer perceptron. Dietzel and Clarke (2005) explored the scale effect of disaggregating land use classes on model calibration and forecasts, while Jantz and Goetz (2005) explored the impact of scale geographically.
Significant changes to SLEUTH calibration involved the analysis of model behavior to detect correlation among the original 13 fit metrics, which resulted in a subset of 8 being used for the Optimal SLEUTH metric (Dietzel & Clarke, 2007). Other work offered new versions of the model with different means of calibration (Jantz, Drzyzga, & Maret, 2014: CAGIS, 2019; Jantz, Goetz, Donato, & Claggett, 2010; Houet, Aguejdad, Doukari, Battaia, & Clarke, 2016). Liu, Sun, Yang, Su, and Qi (2012) made two modifications to improve SLEUTH: using ant colony optimization to calibrate and performing sub-regional calibration to replace calibration of the entire study area. Both modifications improved the calibration accuracy and efficiency compared with the original SLEUTH applications. Wu et al. (2009) employed the relative operating characteristic (ROC) curve statistic, multiple-resolution error budget, and landscape metrics for comparison and validation to evaluate the simulation performance of the SLEUTH urban growth model in the Shenyang metropolitan area of China.
Probably the most significant improvement for SLEUTH calibration was the conversion of the calibration method from brute force to a genetic algorithm. After initial experiments with the new method (SLEUTH-GA) (Clarke-Lauer & Clarke, 2011), a more complete set of procedures and constants were derived for more general application and the code added to the SLEUTH website (Clarke, 2017; Clarke, 2018). Testing showed that SLEUTH-GA could calibrate the model with a computational speed-up of between 3 and 22 times. This means that a single city or regional data set can be calibrated in days instead of years. Jafarnezhad, Salmanmahiny, and Sakieh (2015) also showed the advantages of the genetic algorithm, which include fully automating the calibration process and removing any remaining human subjective choice.
Zhou, Varquez, and Kanda (2019) used the historical distribution of global population as a proxy for urban land cover, to calibrate SLEUTH for the period 2000 to 2013. This simulation used two urban growth layers as 50 arc-minute grids to simulate global urban cover, which they forecast to reach 1.7 × 106 km2 by 2050. The modeling used partitioned data (tiles) and repeat applications by region to get global extent. To reduce the computational load in the calibration, not all coefficients were varied, and data tiles with less than about 25 km2 of urban area according to the 2012 global urban map were excluded. The 2012 global urban extent data were averaged from Landscan data for 2012 and 2013 (Oak Ridge National Laboratory, 2020). Data partitioning also enabled SLEUTH application for other massive data sets, for example all of Italy (Martellozzo, Amato, Murgante, & Clarke, 2018). With these developments, SLEUTH has gained the ability to not only deal with big data, but to ensure accurate and timely calibration even with massive extents and high resolution.
In the present study we took advantage of these SLEUTH improvements to simulate the State of California for the entire 21st century. In addition, we used a very high spatial resolution (30 m) for which data were available. The purpose of the study is to investigate the consistency of SLEUTH spatially. Our research question is: can we create both a reliable forecast of land use in California (excluding the Channel Islands) for the year 2100 CE, and an assessment of the forecast's reliability? A null hypothesis is that all data divisions of California would give similar calibration outcomes (goodness of fit and coefficient values) so that a uniform simulated rate of growth will apply to future simulations. Failing this, is there spatial autocorrelation among the calibrated values? Finally we explored the spatial distribution of the rules that govern land use changes, and the uncertainty inherent within them, as revealed by the SLEUTH model.
Section snippets
Compiling the data
SLEUTH requires data for slope, land use, exclusions, urban extent, transportation and hillshade, where the last is used for visualization and not a part of the model. Percent slope and hillshade were calculated from the 1 arc sec elevation data in the USGS National Elevation Dataset. California Land Use was extracted from the National Land Cover 30 m data for 2001, 2006, and 2011, derived from Landsat satellite data as part of the Multi-Resolution Land Characterization (MRLC) project (See: //www.mrlc.gov/about
Tiling
Each final data set contained 789,777,829 pixels, and since each model run used 6 urban layers, 3 roads layers, 2 land use layers, 1 exclusion, 1 slope and 1 hillshade layer, each model calibration used 14 times the data of the base rasters for a grand total of 1.1 × 10^10 pixels. The whole map was assembled as GeoTIFFs in the Albers equal area map projection at the Landsat native 30 m pixel size. The raster datasets were so large that the SLEUTH model was unable to allocate sufficient memory
Executing the calibration
SLEUTH calibration uses scenario files that hold all needed file identification and parameter data for a SLEUTH model calibration run. Separate scenario files were prepared for each of the 8 major zones, and these were then modified manually to cycle through each of the subtiles. All input data for each zone was maintained in a separate folder in the 8-bit GIF format required by SLEUTH. Calibration consisted of invoking the SLEUTH-GA code using the Cygwin UNIX emulator for MS Windows. It was
Calibration performance
The calibration period was 2001–2017, where the 2017 land use data was assumed unchanged from 2011. This assumption was needed to align the most current urban data (from USDA) with the land use data (from USGS), which were not available for 2017 at the time of the study (data for 2016 have subsequently become available). The results were explored first in terms of performance. An ideal model is able to replicate exactly the last calibration time period (2017) and all intermediate data sets,
Model behavior and parameters
The genetic algorithm creates an initial chromosome filled with random integer values in the {0…100} range. These are then changed in subsequent generations by cross over, competition, replacement, and mutation to evolve the final best fit parameters. Replacement ensures that new random genes can jump to the top and become the most fit at any time. The calibrations were run from a UNIX C-shell script as follows:
date > ../Output/CaliforniaZone6/DD/CAcal-DD
../grow.exe evolve
Mapping the calibration
Choropleth maps were prepared by subtile for all of the metrics using R. First, the 5 performance measures were plotted (Fig. 10).
The maximum and mean OSM values showed the most range, and similar spatial distributions with highs in northern California (Zones 1 and 2), and lows in Central California (Zone 5 W). The independent metrics were universally high, with lows in Zone 5 W, but even there interspersed with high values.
Similar maps were prepared for the five best fit SLEUTH coefficients
Forecasting with SLEUTH
With the calibration complete, the next stage was to run simulations using the best sets of coefficients for the period 2017–2100. As forecasts, there can be no accuracy measure comparable to the calibration, but the accuracy measures are believed to be indicators of the confidence or reliability of the forecasts. Maps of the actual forecasts are to be published elsewhere. As a simple summary, Table 5 lists the numbers of hectares in each of the 13 land use classes for 2001 and 2100. Notable is
Uncertainty in the calibrations and forecast
There are three sources of uncertainty in the SLEUTH modeling effort: the data, the calibration and the forecasts. The MRLC land use data are based on Landsat satellite and other data, and the extraction of classified land use from such data is known to be imperfect for reasons of the vagueness in class descriptions, imprecision in class semantics and other inaccuracy in the classification process. That said, Wickham et al. (2017) found that the MRLC single-date overall accuracies were 82%,
Big data and SLEUTH
Lessons were learned through having to deal with massive amounts of data in SLEUTH modeling. Three keys to bulk data processing were: (1) tiling the data into manageable chunks that allowed each subtile to be calibrated (and forecast) largely independently; (2) simplification of the number of data layers and the use of the genetic algorithm made the model application tractable in terms of CPU time; and (3) the tile naming convention made it possible to repeatedly use the same scenario files and
Conclusion
This study set out to investigate whether spatial autocorrelation exists among SLEUTH calibrations when massive data sets are spatially tiled to make computations possible. Data for the state of California on land use and other factors was used at a 30 m resolution, which required divisions into State Plane zones, then tiles and in most cases subtiles. Of the 192 subtiles, only 174 contained data for calibration. SLEUTH was calibrated for these subtiles using repeated changes to scenario files
References (35)
- et al.
The impact of historical exclusion on the calibration of the SLEUTH urban growth model
International Journal of Applied Earth Observation and Geoinformation
(2014) - et al.
The role of spatial metrics in the analysis and Modeling of urban land use change
Computers, Environment and Urban Systems
(2005) - et al.
Designing and implementing a regional urban modeling system using the SLEUTH cellular urban model
Computers, Environment and Urban Systems
(2010) - et al.
Calibration of the SLEUTH urban growth model for Lisbon and Porto, Portugal
Computers, Environment and Urban Systems
(2002) - et al.
Thematic accuracy assessment of the 2011 National Land Cover Database (NLCD)
Remote Sensing of Environment
(2017) - et al.
High-resolution global urban growth projection based on multiple applications of the SLEUTH urban growth model
Nature: Scientific Data
(2019) SLEUTH3R (Smart SLEUTH)
- et al.
The SLEUTH land use change model: A review
International Journal of Environmental Resources Research
(2013) - et al.
DSLEUTH: a distributed version of SLEUTH urban growth model
Improving SLEUTH calibration with a genetic algorithm