Calibrating SLEUTH with big data: Projecting California's land use to 2100

https://doi.org/10.1016/j.compenvurbsys.2020.101525Get rights and content

Highlights

  • Investigated the spatial consistency of the SLEUTH model using a massive data set.

  • The large size of the data set required data tiling and 174 separate calibrations using a genetic algorithm.

  • A null hypothesis that all tiles would give similar calibration outcomes was proven false by mapping and Moran's I values.

  • 99% of forecast growth in California comes from outward spread from new and existing settlements.

  • Examines the uncertainty of the SLEUTH forecasts.

Abstract

This study investigated the spatial consistency of the SLEUTH urban growth and land use change model using a massive data set. The research asks whether SLEUTH can yield both a reliable forecast of land use in the state of California for the year 2100 CE, and an assessment of the forecast's reliability. Data were prepared, and SLEUTH calibrated for 174 tiles made by partitioning the data within the 6 California State Plane Zones. A null hypothesis that all data divisions of California would give similar calibration outcomes so that a uniform simulated rate of growth would apply to statewide future simulations was proven false by mapping and Moran's I values. Spatial autocorrelation was found to propagate forward into the SLEUTH forecasts, resulting in major differences within the state in land use change and change rates. We also explored the spatial distribution of the rules that changed pixels between land use classes, finding that almost 99% of forecast growth in California comes from outward spread from new and existing settlements. The paper concludes with an examination of the uncertainty inherent within, and displayed by the SLEUTH forecasts.

Introduction

Land use change is a principal driver of global change, and among all land use transitions, the spread of urban areas has a dominant negative impact on greenhouse gas emissions, loss of natural space, farmland and biodiversity, along with increased congestion and environmental pollution (Foley, DeFries, Asner, Barford, et al., 2005). The mapping, modeling, and forecasting of these changes are of critical concern for anticipating and mitigating these negative consequences. Among the most popular and successful land use change models are those based on cellular automata (Clarke, 2019). SLEUTH is a mature land use change and urban growth cellular automaton simulation model that ingests gridded data, and creates probabilistic simulations of future land use states (Chaudhuri & Clarke, 2013). The cellular automaton embeds rules governing urban growth based on spread rules, and class-to-class land use changes based on a Markovian transition matrix computed from past changes. Modeling consists of preparing the input data, testing the model code, using past data to calibrate the model's behavioral coefficients, and allowing the model to run into the future to create scenarios of growth and change. Traditionally, the calibration sequence consisted of a brute force method that adjusted the coefficients to best fit the prior data (Silva & Clarke, 2002). However, this brute force method is both labor and CPU intensive, which has proven to be a barrier to the model's application.

As such, the SLEUTH calibration process has been studied in detail (Clarke, 2008; Clarke, Hoppen, & Gaydos, 1996). Several studies explored the use of parallel and high-performance computing to decrease the calibration time (Chaudhuri & Foley, 2019; Guan & Clarke, 2010). Others investigated the sensitivity of the model to the number of Monte Carlo iterations used (Goldstein, Dietzel, & Clarke, 2005); the duration used for calibration and forecasting (Peiman & Clarke, 2014); the changes made by the self-modification rules (Saxena & Jat, 2018); the means of including past and future exclusions (Akin, Clarke, & Berberoglu, 2014; Onsted & Clarke, 2012); and the use of alternative goodness of fit measures, such as landscape metrics (Herold, Couclelis, & Clarke, 2005). Sakieh, Salmanmahiny, and Mirkarimi (2016) tested alternative models against SLEUTH, such as logistic regression and a multi-layer perceptron. Dietzel and Clarke (2005) explored the scale effect of disaggregating land use classes on model calibration and forecasts, while Jantz and Goetz (2005) explored the impact of scale geographically.

Significant changes to SLEUTH calibration involved the analysis of model behavior to detect correlation among the original 13 fit metrics, which resulted in a subset of 8 being used for the Optimal SLEUTH metric (Dietzel & Clarke, 2007). Other work offered new versions of the model with different means of calibration (Jantz, Drzyzga, & Maret, 2014: CAGIS, 2019; Jantz, Goetz, Donato, & Claggett, 2010; Houet, Aguejdad, Doukari, Battaia, & Clarke, 2016). Liu, Sun, Yang, Su, and Qi (2012) made two modifications to improve SLEUTH: using ant colony optimization to calibrate and performing sub-regional calibration to replace calibration of the entire study area. Both modifications improved the calibration accuracy and efficiency compared with the original SLEUTH applications. Wu et al. (2009) employed the relative operating characteristic (ROC) curve statistic, multiple-resolution error budget, and landscape metrics for comparison and validation to evaluate the simulation performance of the SLEUTH urban growth model in the Shenyang metropolitan area of China.

Probably the most significant improvement for SLEUTH calibration was the conversion of the calibration method from brute force to a genetic algorithm. After initial experiments with the new method (SLEUTH-GA) (Clarke-Lauer & Clarke, 2011), a more complete set of procedures and constants were derived for more general application and the code added to the SLEUTH website (Clarke, 2017; Clarke, 2018). Testing showed that SLEUTH-GA could calibrate the model with a computational speed-up of between 3 and 22 times. This means that a single city or regional data set can be calibrated in days instead of years. Jafarnezhad, Salmanmahiny, and Sakieh (2015) also showed the advantages of the genetic algorithm, which include fully automating the calibration process and removing any remaining human subjective choice.

Zhou, Varquez, and Kanda (2019) used the historical distribution of global population as a proxy for urban land cover, to calibrate SLEUTH for the period 2000 to 2013. This simulation used two urban growth layers as 50 arc-minute grids to simulate global urban cover, which they forecast to reach 1.7 × 106 km2 by 2050. The modeling used partitioned data (tiles) and repeat applications by region to get global extent. To reduce the computational load in the calibration, not all coefficients were varied, and data tiles with less than about 25 km2 of urban area according to the 2012 global urban map were excluded. The 2012 global urban extent data were averaged from Landscan data for 2012 and 2013 (Oak Ridge National Laboratory, 2020). Data partitioning also enabled SLEUTH application for other massive data sets, for example all of Italy (Martellozzo, Amato, Murgante, & Clarke, 2018). With these developments, SLEUTH has gained the ability to not only deal with big data, but to ensure accurate and timely calibration even with massive extents and high resolution.

In the present study we took advantage of these SLEUTH improvements to simulate the State of California for the entire 21st century. In addition, we used a very high spatial resolution (30 m) for which data were available. The purpose of the study is to investigate the consistency of SLEUTH spatially. Our research question is: can we create both a reliable forecast of land use in California (excluding the Channel Islands) for the year 2100 CE, and an assessment of the forecast's reliability? A null hypothesis is that all data divisions of California would give similar calibration outcomes (goodness of fit and coefficient values) so that a uniform simulated rate of growth will apply to future simulations. Failing this, is there spatial autocorrelation among the calibrated values? Finally we explored the spatial distribution of the rules that govern land use changes, and the uncertainty inherent within them, as revealed by the SLEUTH model.

Section snippets

Compiling the data

SLEUTH requires data for slope, land use, exclusions, urban extent, transportation and hillshade, where the last is used for visualization and not a part of the model. Percent slope and hillshade were calculated from the 1 arc sec elevation data in the USGS National Elevation Dataset. California Land Use was extracted from the National Land Cover 30 m data for 2001, 2006, and 2011, derived from Landsat satellite data as part of the Multi-Resolution Land Characterization (MRLC) project (See: //www.mrlc.gov/about

Tiling

Each final data set contained 789,777,829 pixels, and since each model run used 6 urban layers, 3 roads layers, 2 land use layers, 1 exclusion, 1 slope and 1 hillshade layer, each model calibration used 14 times the data of the base rasters for a grand total of 1.1 × 10^10 pixels. The whole map was assembled as GeoTIFFs in the Albers equal area map projection at the Landsat native 30 m pixel size. The raster datasets were so large that the SLEUTH model was unable to allocate sufficient memory

Executing the calibration

SLEUTH calibration uses scenario files that hold all needed file identification and parameter data for a SLEUTH model calibration run. Separate scenario files were prepared for each of the 8 major zones, and these were then modified manually to cycle through each of the subtiles. All input data for each zone was maintained in a separate folder in the 8-bit GIF format required by SLEUTH. Calibration consisted of invoking the SLEUTH-GA code using the Cygwin UNIX emulator for MS Windows. It was

Calibration performance

The calibration period was 2001–2017, where the 2017 land use data was assumed unchanged from 2011. This assumption was needed to align the most current urban data (from USDA) with the land use data (from USGS), which were not available for 2017 at the time of the study (data for 2016 have subsequently become available). The results were explored first in terms of performance. An ideal model is able to replicate exactly the last calibration time period (2017) and all intermediate data sets,

Model behavior and parameters

The genetic algorithm creates an initial chromosome filled with random integer values in the {0…100} range. These are then changed in subsequent generations by cross over, competition, replacement, and mutation to evolve the final best fit parameters. Replacement ensures that new random genes can jump to the top and become the most fit at any time. The calibrations were run from a UNIX C-shell script as follows:

date > ../Output/CaliforniaZone6/DD/CAcal-DD

../grow.exe evolve

Mapping the calibration

Choropleth maps were prepared by subtile for all of the metrics using R. First, the 5 performance measures were plotted (Fig. 10).

The maximum and mean OSM values showed the most range, and similar spatial distributions with highs in northern California (Zones 1 and 2), and lows in Central California (Zone 5 W). The independent metrics were universally high, with lows in Zone 5 W, but even there interspersed with high values.

Similar maps were prepared for the five best fit SLEUTH coefficients

Forecasting with SLEUTH

With the calibration complete, the next stage was to run simulations using the best sets of coefficients for the period 2017–2100. As forecasts, there can be no accuracy measure comparable to the calibration, but the accuracy measures are believed to be indicators of the confidence or reliability of the forecasts. Maps of the actual forecasts are to be published elsewhere. As a simple summary, Table 5 lists the numbers of hectares in each of the 13 land use classes for 2001 and 2100. Notable is

Uncertainty in the calibrations and forecast

There are three sources of uncertainty in the SLEUTH modeling effort: the data, the calibration and the forecasts. The MRLC land use data are based on Landsat satellite and other data, and the extraction of classified land use from such data is known to be imperfect for reasons of the vagueness in class descriptions, imprecision in class semantics and other inaccuracy in the classification process. That said, Wickham et al. (2017) found that the MRLC single-date overall accuracies were 82%,

Big data and SLEUTH

Lessons were learned through having to deal with massive amounts of data in SLEUTH modeling. Three keys to bulk data processing were: (1) tiling the data into manageable chunks that allowed each subtile to be calibrated (and forecast) largely independently; (2) simplification of the number of data layers and the use of the genetic algorithm made the model application tractable in terms of CPU time; and (3) the tile naming convention made it possible to repeatedly use the same scenario files and

Conclusion

This study set out to investigate whether spatial autocorrelation exists among SLEUTH calibrations when massive data sets are spatially tiled to make computations possible. Data for the state of California on land use and other factors was used at a 30 m resolution, which required divisions into State Plane zones, then tiles and in most cases subtiles. Of the 192 subtiles, only 174 contained data for calibration. SLEUTH was calibrated for these subtiles using repeated changes to scenario files

References (35)

  • K.C. Clarke
  • K.C. Clarke

    Land use change Modeling with SLEUTH: Improving calibration with a genetic algorithm. Chapter 8 (pp. 139-162)

  • K.C. Clarke

    Mathematical foundations of cellular automata and complexity theory

  • K.C. Clarke et al.

    Methods and techniques for rigorous calibration of a cellular automaton model of urban growth

  • M.D. Clarke-Lauer et al.

    Evolving simulation modeling: Calibrating SLEUTH using a genetic algorithm

  • C. Dietzel et al.

    The effects of disaggregating land use categories in cellular automata during model calibration and forecasting

    Computers, Environment and Urban Systems

    (2005)
  • C. Dietzel et al.

    Toward optimal calibration of the SLEUTH land use change model

    Transactions in GIS

    (2007)
  • Cited by (0)

    View full text