MLAs land cover mapping performance across varying geomorphology with Landsat OLI-8 and minimum human intervention

https://doi.org/10.1016/j.ecoinf.2021.101227Get rights and content

Highlights

  • Performance of MLAs in land cover mapping was compared across geomorphology.

  • AUC and ROC were used in the performance comparison.

  • The RF showed higher precision in land cover mapping with min human intervention.

Abstract

The machine learning algorithms (MLAs) are capable of automatic land cover classification with a huge volume of data and are prevalent in land mapping applications. Minimal human intervention is desired when producing land cover products over a large area and the choice of an algorithm may determine the precision of the map. The study aims to compare the performance of random forest (RF), decision tree (DT), support vector machine (SVM) and artificial neural network (ANN) algorithms in the context of mapping three typical landscapes (plain, foothill, and mountain) in Hunan Province, China, with minimal human interventions. Performance comparisons among the four machine learning algorithms are based on ROC curves, AUC value, confusion matrix, overall accuracy, spatial comparisons and inconsistency along with altitude and slope. RF produced the most accurate maps (93.0% in mountain area, 93.1% in plain region, and 95.2% in foothill) across various geomorphology with minimal human interventions, and was most resistant to landscape pattern complexity. The accuracy of DT was similar to RF including similar ROC curves and slightly lower accuracy. SVM and ANN showed relatively poor performance without significant human intervention. RF produced robust and highly accurate land cover maps over large areas and various complex geomorphology with little human intervention.

Introduction

With the explosion of population and prosperity of economic, the land cover around the world changed more rapidly in the second half of the twentieth century than at any other time in human history (MA, 2005). The land management and land policy of human play an important role in land use changes (Kuriqi et al., 2020). The land cover types and land cover usage have shifted substantially over recent decades, and these changes may have more significant effects on our globe than climate change (Change, 2000; Foley et al., 2005). In return, the land use change have impacts on environment and human well-being (Ali et al., 2020). Therefore, land cover monitoring in broad extent is necessary to identify the impacts of land cover change on our social as well as physical environmental conditions (Chen et al., 2016; Foody, 2002). The technology of remote sensing is the most useful method available for observing features on the earth over long time periods and wide spatial dimensions (Gross et al., 2013; Lillesand et al., 2015). The classification of land cover with remotely sensed data over large-scale is essential for the estimation of land cover changes considering the characteristics of Earth-observing satellites (Otukei and Blaschke, 2010; Rodriguez-Galiano et al., 2012).

Today, with the explosion of multi-source and multi-temporal remotely sensed data, the processing of huge volumes of satellite data are highly time-consuming which prevent the fully application of remote sensing (Rogan et al., 2008). There is high demand for accurate automated processing techniques for land cover mapping data (Lu and Weng, 2007). Scholars have proposed a number of classification approaches accordingly. The ISODATA, K-means, and minimum distance to means are common and traditional method in land classification. With the development of computer science, advanced techniques including machine learning algorithms (MLAs), for instance, decision tree (DT), random forest (RF), artificial neural network (ANN) and the support vector machine (SVM) sprung up to handle huge volumes of satellite data (Otukei and Blaschke, 2010). Compared with conventional parametric algorithms, MLAs have indeed presented a more accurate and efficient alternatives for land cover classification in managing large data volumes and complex landscapes (Cracknell and Reading, 2014; Rogan et al., 2008).

Unfortunately, MLAs come with several limitations in terms of land cover classification in broad extent (Rodriguez-Galiano et al., 2012). As a large area for land cover classification, very large data volumes and time-consuming data processing, integration and interpretation make minimal human intervention during land cover mapping highly desirable. Although the intervention with human is important to achieve a promising classification, the suitable parameters and “good” kernels of MLAs are hard to select for mappers (Rodriguez-Galiano et al., 2012). And a easily operated and readily automated MLAs, with very minimum possible human intervention, are required for the land cover mapping in a large area (Srivastava et al., 2012). The classification performance of various algorithms vary across different geomorphologies, and it remains unclear which classification algorithms are best suited to different types of land cover (Rogan and Miller, 2006; Shao and Lunetta, 2012). And how machine learning algorithms perform in terms of accuracy, computational complexity, and other indicators across diverse geomorphologies are yet unclear under the condition of minimum human intervention. Consequently, the selection of classification algorithm and identifying which classification algorithms achieve promising performance over different landscape have become particularly important over the large area with minimum human intervention. However, the knowledge of the performance comparison of MLAs in land cover mapping with minimum human intervention across diverse geomorphologies is rare.

In this study, we explored the performance of four MLAs (SVM, ANN, RF, and DT) for land cover mapping over three typical landscapes (plain, foothill, and mountain) in Hunan Province, China in 2017 with Landsat-8 OLI data. The classifications were conducted with minimal human intervention as the primary goal. These four MLAs were chosen as they are widely applied in land cover classification over large areas with remotely sensed data (Belgiu and Drăguţ, 2016; Mountrakis et al., 2011). First, we assessed the various algorithms' performance on different geomorphologies using the ROC curve and AUC value. We then compared their classification accuracy and spatial consistency indicators. Finally, we compared the distribution of accuracy and inconsistency of each algorithm along with altitude and slope. The main objectives of this study were to identify the performance of the four MLAs in land cover mapping with min human intervention across variable geomorphology. This study provide aids for mappers in selecting MLAs for land cover mapping in practice.

Section snippets

Study areas

Study area includes three typical regions in Hunan Province containing plain, foothill and mountain (Fig. 1). Hunan is one province of China that located in the south of the Yangtze River. The region is located in the zone of subtropical monsoon climate where winter is cold and snowy and summer is hot and dry. The annual average temperature in Hunan ranges from 16 °C to 18 °C, and its annual average rainfall varies from 1200 mm to 1700 mm (Ying et al., 2007). Because of the location in the

Methodology

The whole workflow of this study are described in the following (Fig. 2):

  • -

    Input data preparing. Dual-temporal (growing season and non-growing season) Landsat-8/OLI images and corresponding index across varying geomorphology were prepared, as well as the terrain. Then these input data were segmented with the multiresolution segmentation algorithm.

  • -

    Training and classification. Training and validation samples were acquired by stochastically selecting in Google Earth. The training samples were

Validation of machine learning algorithms performance

The ROC of each algorithm is shown in Fig. 4. In the mountain area (125040), all the MLAs effectively classified woodlands and paddyfields with AUC values greater than 0.9. The ANN did not classify dryland or wetland as well as other algorithms (Fig. 4D, J). The AUC values of RF, DT, SVM, and ANN on impervious surfaces were 0.822, 0.809, 0.778, and 0.749, respectively. RF and DT performed better than SVM or ANN on mountain areas as per the AUC values.

In the plains region (124040), the

Discussions

This study purposed to compare the performance of four general MLAs in land cover mapping across different geomorphologies with Landsat OLI-8 and minimal human intervention. We incorporated a suite of multi-temporal Landsat data and environmental variables. Same samples and variables were used during the classification with each MLA to guaranty the consistency in comparing the MLAs. The four classifiers indeed performed well across different geomorphologies with little human intervention. In

Conclusions

Land cover classification with satellite data is a promising approach to monitoring large-scale land surface information. During the land cover mapping in large area, the classification algorithms with minimal human intervention are desirable. The study compared the performance of the four mostly used MLAs (DT, RF, ANN, SVM) to identify which method perform the best in the condition of minimal human intervention varying with geomorphology. The result implied that RF is robust and highly

Declaration of Competing Interest

None.

Acknowledgements

The research has been supported from the Open Fund of Changsha University of Science & Technology (kfj190108), Scientific Research Foundation of Hunan Education Department (19C0043, 17B004) and the National Natural Science Foundation of China (No. 41671446). We express our respects to reviewers and editors.

References (59)

  • D. Gross et al.

    Monitoring land cover changes in African protected areas in the 21st century

    Ecol. Inf.

    (2013)
  • M. Hussain et al.

    Change detection from remotely sensed images: from pixel-based to object-based approaches

    ISPRS J. Photogramm. Remote Sens.

    (2013)
  • B. Melville et al.

    Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia

    Int. J. Appl. Earth Obs. Geoinf.

    (2018)
  • G. Mountrakis et al.

    Support vector machines in remote sensing: a review

    ISPRS J. Photogramm. Remote Sens.

    (2011)
  • L. Naidoo et al.

    Classification of savanna tree species, in the greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a random Forest data mining environment

    ISPRS J. Photogramm. Remote Sens.

    (2012)
  • J.R. Otukei et al.

    Land cover change assessment using decision trees, support vector machines and maximum likelihood classification algorithms

    Int. J. Appl. Earth Obs. Geoinf.

    (2010)
  • M. Pal et al.

    An assessment of the effectiveness of decision tree methods for land cover classification

    Remote Sens. Environ.

    (2003)
  • M. Paliwal et al.

    Neural networks and statistical techniques: a review of applications

    Expert Syst. Appl.

    (2009)
  • D. Phiri et al.

    Effects of pre-processing methods on Landsat OLI-8 land cover classification using OBIA and random forests classifier

    Int. J. Appl. Earth Obs. Geoinf.

    (2018)
  • V.F. Rodriguez-Galiano et al.

    An assessment of the effectiveness of a random forest classifier for land-cover classification

    ISPRS J. Photogramm. Remote Sens.

    (2012)
  • J. Rogan et al.

    Mapping land-cover modifications over large areas: a comparison of machine learning algorithms

    Remote Sens. Environ.

    (2008)
  • Y. Shao et al.

    Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points

    ISPRS J. Photogramm. Remote Sens.

    (2012)
  • P.K. Srivastava et al.

    Selection of classification techniques for land use/land cover change investigation

    Adv. Space Res.

    (2012)
  • B.W. Szuster et al.

    A comparison of classification techniques to support land cover and land use analysis in tropical coastal zones

    Appl. Geogr.

    (2011)
  • J. Tan et al.

    Preliminary assessment of ecosystem risk based on IUCN criteria in a hierarchy of spatial domains: a case study in southwestern China

    Biol. Conserv.

    (2017)
  • J. Tan et al.

    A novel and direct ecological risk assessment index for environmental degradation based on response curve approach and remotely sensed data

    Ecol. Indic.

    (2019)
  • J. Tan et al.

    A SD-MaxEnt-CA model for simulating the landscape dynamic of natural ecosystem by considering socio-economic and natural impacts

    Ecol. Model.

    (2019)
  • C.J. Tucker

    Red and photographic infrared linear combinations for monitoring vegetation

    Remote Sens. Environ.

    (1979)
  • E. Vermote et al.

    Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product

    Remote Sens. Environ.

    (2016)
  • Cited by (12)

    • Inducing flooding index for vegetation mapping in water-land ecotone with Sentinel-1 & Sentinel-2 images: A case study in Dongting Lake, China

      2022, Ecological Indicators
      Citation Excerpt :

      Among these, RF is an efficient classifier using multiple decision trees (Duro et al., 2012; Vogels et al., 2017). RF algorithm has the advantages of high accuracy and robustness to noise (Tan et al., 2021a; Tan et al., 2021b), and has been used in many studies (Collins et al., 2020). In this experiment, the RF classifier in the Sklearn Library in Python was used for classification (Pedregosa et al., 2011), and sample data were divided into training data and verification data according to 7:3.

    • Evaluation and comparison of the earth observing sensors in land cover/land use studies using machine learning algorithms

      2022, Ecological Informatics
      Citation Excerpt :

      ANN method is highly self-learning and adaptive method with the ability to identify and generalize complex datasets (Ghayour et al., 2021; Gong et al., 2011; Yu and Chen, 2020). For the prediction, ANN creates a method from the input layers (Kalantar et al., 2018; Tan et al., 2021). In general, ANN consists of three layers including input layers (bands and indices of satellite data), hidden layers (calculations of neurons), and output layers (LCLU classes).

    • Effect of the location pattern of rural residential buildings on natural ventilation in mountainous terrain of central China

      2022, Journal of Cleaner Production
      Citation Excerpt :

      This hilly terrain has no obvious veins, with a gentle slope and a rounded top, and its slope is often between 20 and 25° in Hunan Province. In geographical terms, 25° is also considered as the critical slope to distinguish between gentle hills and steep hills (Tan et al., 2021; Zhao et al., 2002). Therefore, the slope of the hills can be considered as 25° in this study.

    View all citing articles on Scopus
    View full text