Elsevier

Computers & Geosciences

Volume 144, November 2020, 104557
Computers & Geosciences

Land use/land cover recognition in arid zone using A multi-dimensional multi-grained residual Forest

https://doi.org/10.1016/j.cageo.2020.104557Get rights and content

Abstract

Monitoring arid areas could effectively improve economic, ecological and humanity benefits. It is an effective monitoring approach to recognize the land cover or land use of arid areas through machine learning methods using satellite images. However, there is no public classified dataset for arid areas currently, and hence remote sensing image monitoring in desert areas is restricted. Existing classification methods are not able to fully utilize effective features of satellite images and multi-spectral optical parameters. In this paper, our contributions are as follows: Firstly, we presented a new satellite dataset named the ARID-5 for arid area land cover/land use (LULC) classification, the LULC in arid areas included desert, oasis, Gobi, and water system. Second, we proposed a machine learning algorithm named the multi-dimensional multi-grained residual forest algorithm for LULC recognition on arid areas. In this algorithm, the multi-dimensional multi-grained structure was able to effectively extract image features and spectral information. The residual forest structure mapped probability feature vectors to higher levels for prediction, which effectively improved the reflection of the forest structure on the sample. At the same time, the base estimator was transmitted in cascade layers, and thus the diversity and the accuracy were improved. Experimental results proved that the multi-dimensional multi-grained residual forest showed good classification abilities. Last, we also tested our algorithm on SAT-4 and SAT-6 datasets, which proved the generalization performance of our algorithm.

Introduction

Land desertification is one of the biggest environmental problems that the world is facing today. Due to the rapid expansion of desert areas, environmental degradation and huge economic losses were observed and political stability and social security problems thus arose in these areas. The desert oasis eco-interlacing zone was a narrow zone between the desert and the oasis, acting as the boundary and corridor between desert and oasis ecosystems, and the zone controlled the energy and nutrient flows (Wang et al., 2007). Because of dry climate and improper human intervention, the desert-oasis eco-interlacing zone became the most sensitive area to human activities such as land and water resource developments (Qiao and Wang, 2019), which might cause severe depletion of natural vegetation (Li et al., 2016). The stability of oasis ecosystems was also affected by desertification of oasis-desert ecotones and salinization within oases. These changes led to drought and desertification, hindering sustainable agriculture development. Therefore there was an urgent need to understand the interactions between the oasis and the desert (Liu et al., 2007). Desertification was already a global concern (Henry, 2002).

In recent years, various machine learning methods were used in identification and classification of remote sensing land cover and achieved good results (Sun et al., 2019; Martiniano et al., 2018; Buddhiraju and Rizvi, 2010; Shih et al., 2019; Zhai et al., 2019a, Zhai et al., 2019b). A Gaussian edge monitoring method was proposed for remote sensing image analysis, it was able to extract edge information effectively (Basu, 2002), but its calculation speed, convergence ability and multi-scale correlation remained unsatisfactory. The Maximum Likelihood Classification (MLC) method was used to plot and monitor remote sensing and geographic information of land use/land cover (LULC) changes in coastal areas of northwestern Egypt (Shalaby and Tateishi, 2007), the MLC was a parametric classification method, if the distribution estimate were accurate, good results would be achieved. However, the MLC was very sensitive to parameter estimates. If the estimation were incorrect, there would be significant negative impacts on the results. The Support Vector Machine (SVM) was also used to measure the land cover (Kavzoglu and Colkesen, 2009; Liu and Chen, 2019; Petropoulos et al., 2012), the land cover texture features were extracted by fractal dimensions, the gray level co-occurrence matrices and the wavelet transforms. Consequently, the land cover was identified using an SVM with a Radial Basis Function (RBF) as its core function. The gray level co-occurrence matrix was used to extract texture features of hyperspectral images (Xin et al., 2014), its speed was fast, but its results on some large locals were not very satisfactory. The Random Forest (RF) (Liaw and Wiener, 2002) was widely used in remote sensing images. In Halmy's work (Halmy et al., 2015), the RF was able to process high-dimensional data (Zhang et al., 2019), and its generalization ability was strong, but when the data noise became large, the over-fitting phenomenon happened easily. When there were data with different values, those with significant values would have greater impacts on the random forests. With the development of convolutional neural networks (CNN), the CNN was widely used in LULC classification (Helber et al., 2019; Rakshit et al., 2018; Zhai et al., 2019a, Zhai et al., 2019b). In Keshk's work (Keshk and Yin, 2020), the CNN was used in desert classification, but it was obvious that the neural network was more time-consuming in comparison to machine learning algorithms because the neural network required a large amount of backpropagation, and hence required a large amount of computing resources in real time detection of remote sensing images (Xia et al., 2019; Roy et al., 2011; Jiao et al., 2019).

In fact, there is little research aimed at LULC classification in arid regions, because there is no public classified dataset for arid areas currently. There are many researches aiming at remote sensing image spatial features and spectral features (Rasti et al., 2019; Lei et al., 2019; Imani and Ghassemian, 2020). From those researches, we could see that one key to improve the classification accuracy is to effectively extract spatial and spectral features. In spatial features of arid areas, the boundaries of desert, gobi and oasis were ambiguous, and different terrains overlapped with each other in space. Therefore the accuracy of arid area classification using traditional remote sensing image classification methods was relatively low (Megahed et al., 2015). In addition, in terms of spectral characteristics, differences in field reflectance spectra of different types of sand and Gobi in visible and near-infrared bands (400–1100 nm range) are not very obvious (Wang and Gao, 1984), as a result, many algorithms performed poorly on land cover/land use classification in desert areas. Therefore, we should pay more attention to spectral information and spatial information on land cover/land use classification in arid areas. The LULC recognition based on remote sensing images in arid/semi-arid areas was an important approach to achieve large-scale investigation of desert areas (Myint and Okin, 2009; Medjani et al., 2017). In Alqurashi's work (Alqurashi and Kumar, 2014), the RF was used to detect land use and land cover changes in Saudi Arabia desert cities, but its accuracy is not exciting, mainly because there is no effective feature extraction method. In order to solve these problems, we present a new satellite dataset named the ARID-5 aimed for arid area LULC classification and propose an algorithm which can effectively extract both spectral features and spatial features.

In this paper, a multi-dimensional multi-grained residual forest (mgrForest) algorithm was proposed for land cover/land use classification on remote sensing images of arid areas. Remote sensing data of Northwest China was chosen as the research target data. Our proposed method was compared with various classification methods such as SVM, RF and CNNs. Our method, the mgrForest, was proven to be able to fully extract both spatial features and spectral information from multispectral data, and therefore it presented higher accuracy in desert, Gobi and oasis recognition of multi-spectral data. Plus, the number of parameters requiring optimization in our method was much fewer than that of CNNs. The advantages of our proposed method could be summarized as: fewer parameters to set, easier to train, and faster to classify.

In addition, in order to prove the generalization performance of our algorithm, we compared our algorithm with existing algorithms on STA-4 and STA-6 datasets, and we achieved state-of-the-art results in terms of accuracy.

To summarize, our contributions are as follows: (1) We presented a new satellite dataset named the ARID-5 for arid area LULC classification; (2) A novel machine learning algorithm named the mgrForest was proposed for land use/land cover recognition in arid areas; (3) Achieved state-of-the-art results in SAT-4 and SAT-6 datasets.

Section snippets

Random forest

The RF is an ensemble learning algorithm (Liaw and Wiener, 2002). Based on Bagging (Breiman, 1996), the decision tree is used as the base classifier in the RF. As to a single decision tree classifier, it was difficult to satisfy both accuracy and diversity. However, the RF algorithm overcame the problems of over-fitting and classification instability which a single decision tree classifier presented. Different from the Bagging algorithm, the diversity of the RF was not only reflected in

Enhanced GcForest algorithm: multi-dimensional multi-grained residual forest algorithm

Zhou and Feng (2017) described in detail how to scan sequence data and single-dimensional image data, but the classification of multi-dimensional images was not introduced. The paper just mentioned that the gcForest was tested on the CIFAR-10 dataset and the results were not satisfactory, the paper did not address how to get feature re-representation using sliding window scanning in detail when the input was a multi-dimensional image. We further explored the sliding window scanning approach.

Make ARID-5 dataset

Our study selected northwest China and its surrounding areas as the research target. The northwest region included five provinces and autonomous regions, which were Shanxi Province, Gansu Province, Qinghai Province, Ningxia Hui Autonomous Region and Xinjiang Uygur Autonomous Region. The northwest region was a deep inland part of China. It was known for its vast area, water shortage, widespread desert, heavy wind and sand, fragile ecology, sparse population, abundant resources and development

Results

All our experiments are under the following hardware environment: i7 8700k, ram 32g, and GPU: single NVIDIA 2070; Experimental software is python3.6, and the main scientific computing packages used are PyTorch and scikit-learn. Because the matrix calculation of CNNs can be accelerated on the GPU, the training calculation of CNNs in this paper is based on the GPU, whereas other training calculations are CPU-based parallel computing. Inference processes are calculated by a single process on the

Conclusions and future work

In this work, we proposed a novel machine learning algorithm based on the gcForest, which could effectively classify remote sensing images in desert areas and was faster and more accurate than the other existing algorithms. In addition, we achieved SOTA results on SAT-4 and SAT-6 datasets, which proved the generalization performance of our algorithm. Furthermore, we created an open-access dataset named the ARID-5 in Arid Zone for land cover/land use classification for future study.

In our future

Formatting of funding sources

This research was partly funded by the National Natural Science Foundation of PR China (No. 61773219).

Computer code availability

Name of code:mgrforest.py,

Developer:Ming Qian etc.,e-mail:[email protected],

Year first available:2020,

cpu:i7 8700, RAM 32g.

Experimental software: python3.6.

Packages:PyTorch, scikit-learn

The trained networks are available at https://github.com/qianmingduowan/A-Multi-dimensional-Multi-grained-Residual-Forest.

The experiment dataset can be obtained from the following link: https://pan.baidu.com/s/1RjF3BSOzIDhmq0oAi7-xnA, password:dpcu.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (61)

  • Abdullah F. Alqurashi et al.

    Land use and land cover change detection in the saudi arabian desert cities of makkah and al-taif using satellite data

    Adv. Rem. Sens.

    (2014)
  • T. Amuti et al.

    Analysis of land cover change and its driving forces in a desert oasis landscape of Xinjiang, northwest China

    Solid Earth

    (2014)
  • M. Basu

    Gaussian-based edge-detection methods-a survey

    IEEE Trans. Syst. Man Cybern. C Appl. Rev.

    (2002)
  • S. Basu et al.

    DeepSat: a Learning Framework for Satellite Imagery

    (2015)
  • S. Basu et al.

    A theoretical analysis of Deep Neural Networks for texture classification

  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • K.M. Buddhiraju et al.

    Comparison of CBF, ANN and SVM classifiers for object based classification of high resolution satellite images

    2010 IEEE Int. Geosci. Remote Sens. Symposium

    (2010)
  • T. Chen et al.

    XGBoost: a scalable tree boosting system

  • Z. Gong et al.

    Diversity-promoting deep structural metric learning for remote sensing scene classification

    IEEE Trans. Geosci. Rem. Sens.

    (2018)
  • K. He et al.

    Deep residual learning for image recognition

  • P. Helber et al.

    EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification

    IEEE J. Selected Topics in Appl. Earth Observations and Remote Sensing

    (2019)
  • N. Henry

    Man-made deserts: desertization processes and threats

    Arid Land Res. Manag.

    (2002)
  • T.K. Ho

    Random decision forests

    (1995)
  • L. Jiao et al.

    A hierarchical classification framework of satellite multispectral/hyperspectral images for mapping coastal wetlands

    Rem. Sens.

    (2019)
  • A. Karlin et al.

    Using SPOT and Aerial False-Color Infrared (fCIR) Imagery to Verify Floodplain Model Results in West Central Florida

    (2016)
  • T. Kavzoglu et al.

    A kernel functions analysis for support vector machines for land cover classification

    Int. J. Appl. Earth Obs. Geoinf.

    (2009)
  • S. Keller

    Physical Geography of China

    (2016)
  • H.M. Keshk et al.

    Classification of EgyptSat-1 Images Using Deep Learning Methods

    (2020)
  • P. Kontschieder et al.

    Deep neural decision forests

    2015 IEEE Int. Confer. Comput. Vision (ICCV)

    (2015)
  • A. Liaw et al.

    Classification and regression by random forest

    R. News

    (2002)
  • Cited by (0)

    View full text