Elsevier

CATENA

Volume 187, April 2020, 104396
CATENA

Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping

https://doi.org/10.1016/j.catena.2019.104396Get rights and content

Highlights

  • Landslide susceptibility mapping models in a loess area were evaluated.

  • ADTree, ADTree with AdaBoost, and ADTree with Bagging were applied.

  • The impact of each factor on the landslide occurrence was detailed analyzed.

  • Ensemble models enhance the accuracy of solely applied ADTree model.

  • ADTree with AdaBoost shows the best result in landslide prediction.

Abstract

Landslides are a common type of natural disaster that brings great threats to the human lives and economic development around the world, especially in the Chinese Loess Plateau. Longxian County (Shaanxi Province, China), a landslide-prone area located in the southwest part of the Loess Plateau, was selected as the study area. The main purpose of this paper is to map landslide susceptibility using Alternating decision tree (ADTree) as well as GIS-based new ensemble techniques involving ADTree with bootstrap aggregation (Bagging) and ADTree with adaptive boosting (AdaBoost). Initially, a landslide inventory map was prepared with 171 determined historical landslides events in the study area, 120 landslides (70%) were randomly selected for training dataset and the remaining 51 landslides (30%) were used for validation dataset. Subsequently, eleven landslide conditioning factors were considered in the landslide susceptibility mapping. Then, an optimization operation on selection of landslide conditioning factors was performed using correlation attribute evaluation method and Spearman’s rank correlation coefficient. Afterwards, landslide susceptibility maps were generated with the three models. Finally, receiver operating characteristic (ROC) curve, area under the ROC curve (AUC) and statistical measures were applied to evaluate and validate the performance of the models. The results show success rates of the ADTree model, the ADTree with Bagging (ADTree-Bagging) model and the ADTree with AdaBoost (ADTree-AdaBoost) model were 0.872, 0.917, and 0.984, respectively, while prediction rates of the three models were 0.696, 0.752 and 0.787, respectively. In sum, the two ensemble models proposed prohibited better performance than the ADTree model did, and the ADTree-AdaBoost model was selected as the best model in the study. Hence, ensemble techniques can provide new and promising methods for spatial prediction and zonation of landslide susceptibility.

Introduction

Landslide is the downslope movement of masses including soil, rock, and organic materials under the driving force of gravity effect (Highland and Bobrowsky, 2008). Landslides not only play important role in geomorphology evolution but also a common and catastrophic occurrence of geological disasters throughout the world, causing billions of dollars in economic losses and thousands of casualties and injuries each year (Guzzetti et al., 1999, Highland and Bobrowsky, 2008, Huang and Fan, 2013). China is one of the countries where geohazards enormously occurred and widely distributed, one-third geohazards happen in the Loess Plateau, and 85% of them are landslides (Xu et al., 2014, Wu et al., 2016a, Wu et al., 2016b, Zhuang et al., 2018). The Loess Plateau lies in the middle reach of the Yellow River, covering an area of approximately 430,000 km2 which is about 4.4% of China's territorial area, and landslides frequency is increasing with time in this area due to the infrastructure construction and increasing land demand, the most catastrophic landslides happened in 1920 caused over 10,000 casualties in Haiyuan area of the Loess Plateau (Liu, 1985, Derbyshire, 2001, Zhang and Wang, 2007, Li et al., 2014, Wang et al., 2014). Moreover, landslides can activate the multiple hazard effect in a certain area and brings even greater damages. Landslide susceptibility mapping (LSM) could provide estimation on where the landslides are likely to happen (Guzzetti, 2006). Therefore, it is significant and worthwhile to make LSM for landslide-prone areas to provide valuable information for local government to make integrated landslides prevention plans so as the land planning and utilization (Hong et al., 2016).

With the aid of fast-developing geographic information system (GIS) and soft computing techniques, various modeling approaches were applied to conduct landslide spatial analysis and susceptibility mapping around the world in the last three decades (Carrara et al., 1991, Chen et al., 2017c). The modeling approaches can be mainly classified into heuristic, deterministic and statistic ( Dai et al., 2002). Recently, ensemble techniques were also being applied and prohibited significantly enhanced predictive power than solely-applied models in landslide modelling (Dietterich, 2000, Althuwaynee et al., 2014, Chen et al., 2017a).

A heuristic (qualitative) approach is based on experts’ opinion and experience to assign weight to different landslide conditioning factor in estimating landslide potential (Dahal et al., 2008). However, this approach also has some limitations concerning unacceptable results generated by inadequate knowledge of study area and poor reproducibility (Dahal et al., 2008, Yilmaz, 2009). Analytical hierarchy process and weighted linear combination are two heuristic approaches that were commonly used in LSM (Ayalew et al., 2004, Yalcin, 2008, Shahabi et al., 2014). Moreover, regarding a deterministic (quantitative) approach, it is limited to be applied at large scale in small areas where landslide types are simple and ground conditions are fairly uniform (Dai et al., 2001, Yilmaz, 2009).

To break through limitations of the above two approaches, statistical methods were developed and applied to generating reliable LSM based on carefully collecting large quantities of geo-environmental data, statistical relations between historical landslides and conditioning factors in the given area were efficiently analyzed (Guzzetti et al., 2006, Rossi and Reichenbach, 2016, Althuwaynee et al., 2016). Reichenbach et al. (2018) made further classification on statistic-based models into 6 groups including classical statistics, index-based, machine learning, neural networks, multi criteria decision analysis, and other statistics. Among which classical statistics (logistic regression, linear regression etc.) and index-based such as weight of evidence were most frequently used. Various attempts were made on applying some new data mining with machine learning algorithms in mapping landslide susceptibility, such as artificial neural network (Yilmaz, 2009), support vector machine (Huang and Zhao, 2018), naïve bayes (Tsangaratos and Ilia, 2016), decision tree (Yeon et al., 2010, Zhang et al., 2017). Moreover, decision tree algorithm was also extended and developed into alternating decision tree (ADTree) (Pham et al., 2016a), classification and regression trees (CART) (Youssef et al., 2016), J48 decision tree (Tien Bui et al., 2014) and random forest (RF) (Stumpf and Kerle, 2011, Trigila et al., 2015) etc. have also been proposed in LSM research with high efficiency for classification and achieved promising predictive power. More importantly, ADTree as a boosting decision tree procedure can also produce very accurate classifiers and the classifiers are easier to interpret than some decision tree algorithms such as C4.5, CART, RF, and singled decision tree (Freund and Mason, 1999). It is feasible to explore its potential in landslides prediction in this study.

Due to the ability to enhance prediction accuracy of models, ensemble techniques gradually received researchers’ great attention in different field worldwide but were rarely used in landslides prediction (Lee and Oh, 2012, Tien Bui et al., 2014, Althuwaynee et al., 2014, Reichenbach et al., 2018). More and more hybrid and integration of different models were applied and evaluated in LSM study, each combination showed superior prediction performance with high reliability for LSM, however more ensemble-based approaches need to be further explored and applied (Tien Bui et al., 2014, Althuwaynee et al., 2014, Althuwaynee et al., 2016, Youssef et al., 2015, Pham et al., 2016b, Pham et al., 2017a, Pham and Prakash, 2017). It is recommended to combine the different model to reduce model errors and enhance the reliability of landslide susceptibility prediction (Rossi et al., 2010). AdaBoost (adaptive boosting) algorithm proposed by Freund and Schapire (1997) and Bagging (bootstrap aggregation) algorithm introduced by Breiman (1994) were two of the most popular and earliest techniques, however rarely applied in landslide susceptibility analyses, which need to be widely explored. Bagging ensemble based J48 decision tree, functional decision tree (FDT), naïve bayes, ADTree, and FR as well as AdaBoost ensemble based J48 and FDT were used and produced much preferable results than individual models (Tien Bui et al., 2014, Tien Bui et al., 2016b, Pham and Prakash, 2017, Pham et al., 2017b).

The Loess Plateau is of great importance to the Silk Road Economic Belt carried out by the Chinese government, disaster prevention and reduction as well as land resource utilization is essential to this giant economic development project. Loess is characterized by loose texture, vertical joints and fissures, macro pores, high sensitivity to water and suction stress, which makes it prone to slope failure when subjected to rainfall, anthropogenic activities, erosion, and seismic shock, the widely distributed loess landslides pose great threat and danger (Derbyshire, 2001, Zhang and Wang, 2007, Zhuang et al., 2018, Peng et al., 2019). Lei (2001) reported the landslides occurred from 1950s to 1992 in two provinces (Shannxi and Gansu) in the Loess Plateau, there were 16,616 landslides in northern Shannxi and 14,109 in Gansu, with the density of 5 and 6 landslides per square kilometer, respectively. Peng et al. (2014) further reported that 1131 landslides occurred in Shannxi and 4576 landslides in eastern Gansu since 2008. In this case, it is very necessary to carry out LSM on landslide prone areas to provide information on potential spatial occurrence of slope failures (Guzzetti, 2006). In recent years, some researchers have successfully conducted LSM on some counties and river basins in the Loess Plateau using different models involving heuristic, deterministic and statistic approaches (Chen et al., 2014, Chen et al., 2015, Chen et al., 2016a, Chen et al., 2016b, Chen et al., 2017a, Z. Chen et al., 2017, Wang et al., 2015, Wu et al., 2017, Wu and Ke, 2016). Among which, Frequency ratio and certainty factor were most frequently used, but limited models were applied, and study of ensemble techniques in this area has not been reported by previous literature. Longxian (Shaanxi, China), a county locates in the southwest region of the Loess Plateau, suffered a lot from the frequent loess landslides, which makes it a valuable region to conduct LSM research.

The current study aims to explore the potential application of the ADTree and its two ensembles, namely, Bagging and AdaBoost in LSM at Longxian County, and a comparison of their overall performance was made. The ADTree-AdaBoost, as a novel model, was first proposed in landslide prediction analysis. The application of the other two models remains poorly constrained and needs to be further explored. This paper provides new ideas and useful information for landslide related research, and the landslide susceptibility maps can help decision makers to better utilize land resources to achieve economic development as well as bring harmony between human being and the fragile loess environment.

Section snippets

Description of the study area

The study area (Longxian County) belongs to Shaanxi Province, is in the southwestern edge of the Loess Plateau and the northwest part of China (Fig. 1). Longxian County lies between longitudes of 106°26′32″ to 107°08′11″E and latitudes of 34°35′17″ to 35°06′45″N, covering an area of approximately 2418 km2 with 0.27 million people by 2016. This area is comprised of approximately 31.3% farmland, 18.0% bareland, 0.4% residential areas, 0.1% water bodies, and 50.2% forest and grass. The climate is

Data used

In the present study, mainly five basic datasets were collected, including historical landslide records, a digital elevation model (DEM) in a resolution of 30 m, Landsat 8 Operational Land Imager (OLI) images in a spatial resolution of 30 m, Google Earth satellite images, and a lithology map at a scale of 1:200,000. The historical landslide records were sourced from the Nuclear Industry Geological Survey in Shaanxi Province. DEM and Landsat 8 LOI images were obtained from Geospatial Data Cloud

Evaluation of landslide conditioning factors

It is necessary to evaluate the predictive capability of landslide conditioning factors to acquire more accurate landslide susceptibility modeling, because some factors may have a negative effect on the generated models. Moreover, an existing strong correlation between these factors is also bad for the model performance (Tien Bui et al., 2016b). Thus, in this study, an attribute evaluator called “CorrelationAttributeEval (correlation attribute evaluation)” was used to obtain the predictive

Landslides distribution pattern

The landslide inventory map and maps of the eleven landslide conditioning factors were used to acquire the landslide occurrence probability of each class. The results of frequency ratio (FR) values in each class are summarized in Table 1. As for altitude, it shows the highest FR value of 2.314 for the class of 1000–1200 m, and then the value decreases as the altitude increases. Additionally, over 92% landslides occurred in altitude less than 1200 m, which is the also the concentrated area of

Conclusions

In this research, the ADTree and its two ensembles, namely, Bagging and AdaBoost, were applied for LSM in Longxian County (China). Besides, the ADTree-AdaBoost, as a novel combination model, was applied in landslide susceptibility modeling for the first time. Moreover, the performance of the three models was systematically evaluated and analyzed to select the optimized model for the study area. In sum, the following conclusions can be obtained:

According to integrated analysis of landslide

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We are very thankful to Victor Jetten, editor of the CATENA journal and two anonymous reviewers for their valuable comments and suggestions to improve the quality of our paper. This study was supported by National Basic Research Program of China (973 Program, No. 2014CB744701), the National Natural Science Foundation of China (No. 41072213). The authors acknowledge the PhD scholarship awarded to Yutian Ke (No. 201706180008) and Haoyuan Hong (NO. 201906860029) by the China Scholarship Council.

References (101)

  • Y. Freund et al.

    A desicion-theoretic generalization of on-line learning and an application to boosting

    J. Comput. Syst. Sci.

    (1997)
  • R.H. Guthrie et al.

    An examination of controls on debris flow mobility: evidence from coastal British Columbia

    Geomorphology

    (2010)
  • F. Guzzetti et al.

    Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy

    Geomorphology

    (1999)
  • F. Guzzetti et al.

    Estimating the quality of landslide susceptibility models

    Geomorphology

    (2006)
  • H. Hong et al.

    Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines

    Catena

    (2015)
  • H. Hong et al.

    Landslide susceptibility assessment in Lianhua County (China): a comparison between a random forest data mining technique and bivariate and multivariate statistical models

    Geomorphology

    (2016)
  • Y. Huang et al.

    Review on landslide susceptibility mapping using support vector machines

    Catena

    (2018)
  • L. Lombardo et al.

    Presenting logistic regression-based landslide susceptibility results

    Eng. Geol.

    (2018)
  • B. Martín et al.

    Influence of spatial heterogeneity and temporal variability in habitat selection: a case study on a great bustard metapopulation

    Ecol. Model.

    (2012)
  • M.A. Passman et al.

    Validation of venous clinical severity score (VCSS) with other venous severity assessment tools from the American venous forum, national venous screening program

    J. Vasc. Surg.

    (2011)
  • J. Peng et al.

    Heavy rainfall triggered loess–mudstone landslide and subsequent debris flow in Tianshui, China

    Eng. Geol.

    (2015)
  • J. Peng et al.

    Distribution and genetic types of loess landslides in China

    J. Asian Earth Sci.

    (2019)
  • B.T. Pham et al.

    Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS

    Catena

    (2017)
  • H.R. Pourghasemi et al.

    Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran

    Catena

    (2012)
  • B. Pradhan et al.

    Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling

    Environ. Modell. Softw.

    (2010)
  • O. Rahmati et al.

    PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches

    Sci. Total Environ.

    (2019)
  • P. Reichenbach et al.

    A review of statistically-based landslide susceptibility models

    Earth-Sci. Rev.

    (2018)
  • M. Rossi et al.

    Optimal landslide susceptibility zonation based on multiple forecasts

    Geomorphology

    (2010)
  • H. Shahabi et al.

    Landslide susceptibility mapping at central Zab basin, Iran: a comparison between analytical hierarchy process, frequency ratio and logistic regression models

    Catena

    (2014)
  • R.C. Sidle et al.

    Erosion processes in steep terrain—truths, myths, and uncertainties related to forest management in Southeast Asia

    For. Ecol. Manage.

    (2006)
  • A. Stumpf et al.

    Object-oriented mapping of landslides using Random Forests

    Remote Sens. Environ.

    (2011)
  • Y. Thiery et al.

    Landslide susceptibility assessment by bivariate methods at large scales: application to a complex mountainous environment

    Geomorphology

    (2007)
  • A. Trigila et al.

    Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

    Geomorphology

    (2015)
  • P. Tsangaratos et al.

    Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size

    Catena

    (2016)
  • A. Yalcin

    GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations

    Catena

    (2008)
  • Y.K. Yeon et al.

    Landslide susceptibility mapping in Injae, Korea, using a decision tree

    Eng. Geol.

    (2010)
  • I. Yilmaz

    Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat—Turkey)

    Comput. Geosci.

    (2009)
  • D. Zhang et al.

    Study of the 1920 Haiyuan earthquake-induced landslides in loess (China)

    Eng. Geol.

    (2007)
  • D. Zhang et al.

    A rapid loess flowslide triggered by irrigation in China

    Landslides

    (2009)
  • A. Akgun et al.

    Landslide susceptibility mapping for a landslide-prone area (Findikli, NE of Turkey) by likelihood-frequency ratio and weighted linear combination models

    Environ. Geol.

    (2008)
  • A.K. Akobeng

    Understanding diagnostic tests 3: receiver operating characteristic curves

    Acta Paediatr.

    (2007)
  • O.F. Althuwaynee et al.

    A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping

    Landslides

    (2014)
  • O.F. Althuwaynee et al.

    A novel integrated model for assessing landslide susceptibility mapping using CHAID and AHP pair-wise comparison

    Int. J. Remote Sens.

    (2016)
  • L. Ayalew et al.

    Landslide susceptibility mapping using GIS-based weighted linear combination, the case in Tsugawa area of Agano River, Niigata Prefecture, Japan

    Landslides

    (2004)
  • L. Breiman

    Bagging predictors

    Machine learning

    (1994)
  • A. Brenning

    Spatial prediction models for landslide hazards: review, comparison and evaluation

    Nat. Hazards Earth Syst. Sci.

    (2005)
  • A. Carrara et al.

    GIS techniques and statistical models in evaluating landslide hazard

    Earth Surf. Proc. Land

    (1991)
  • W. Chen et al.

    Application of weights-of-evidence model in landslide susceptibility mapping at Baozhong Region in Baoji, China

    Environ. Geol.

    (2014)
  • W. Chen et al.

    Landslide susceptibility mapping based on GIS and information value model for the Chencang District of Baoji, China

    Arab. J. Geosci.

    (2014)
  • W. Chen et al.

    Application of frequency ratio, statistical index, and index of entropy models and their comparison in landslide susceptibility mapping for the Baozhong Region of Baoji, China

    Arab. J. Geosci.

    (2015)
  • Cited by (246)

    View all citing articles on Scopus
    View full text