Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping
Introduction
Landslide is the downslope movement of masses including soil, rock, and organic materials under the driving force of gravity effect (Highland and Bobrowsky, 2008). Landslides not only play important role in geomorphology evolution but also a common and catastrophic occurrence of geological disasters throughout the world, causing billions of dollars in economic losses and thousands of casualties and injuries each year (Guzzetti et al., 1999, Highland and Bobrowsky, 2008, Huang and Fan, 2013). China is one of the countries where geohazards enormously occurred and widely distributed, one-third geohazards happen in the Loess Plateau, and 85% of them are landslides (Xu et al., 2014, Wu et al., 2016a, Wu et al., 2016b, Zhuang et al., 2018). The Loess Plateau lies in the middle reach of the Yellow River, covering an area of approximately 430,000 km2 which is about 4.4% of China's territorial area, and landslides frequency is increasing with time in this area due to the infrastructure construction and increasing land demand, the most catastrophic landslides happened in 1920 caused over 10,000 casualties in Haiyuan area of the Loess Plateau (Liu, 1985, Derbyshire, 2001, Zhang and Wang, 2007, Li et al., 2014, Wang et al., 2014). Moreover, landslides can activate the multiple hazard effect in a certain area and brings even greater damages. Landslide susceptibility mapping (LSM) could provide estimation on where the landslides are likely to happen (Guzzetti, 2006). Therefore, it is significant and worthwhile to make LSM for landslide-prone areas to provide valuable information for local government to make integrated landslides prevention plans so as the land planning and utilization (Hong et al., 2016).
With the aid of fast-developing geographic information system (GIS) and soft computing techniques, various modeling approaches were applied to conduct landslide spatial analysis and susceptibility mapping around the world in the last three decades (Carrara et al., 1991, Chen et al., 2017c). The modeling approaches can be mainly classified into heuristic, deterministic and statistic ( Dai et al., 2002). Recently, ensemble techniques were also being applied and prohibited significantly enhanced predictive power than solely-applied models in landslide modelling (Dietterich, 2000, Althuwaynee et al., 2014, Chen et al., 2017a).
A heuristic (qualitative) approach is based on experts’ opinion and experience to assign weight to different landslide conditioning factor in estimating landslide potential (Dahal et al., 2008). However, this approach also has some limitations concerning unacceptable results generated by inadequate knowledge of study area and poor reproducibility (Dahal et al., 2008, Yilmaz, 2009). Analytical hierarchy process and weighted linear combination are two heuristic approaches that were commonly used in LSM (Ayalew et al., 2004, Yalcin, 2008, Shahabi et al., 2014). Moreover, regarding a deterministic (quantitative) approach, it is limited to be applied at large scale in small areas where landslide types are simple and ground conditions are fairly uniform (Dai et al., 2001, Yilmaz, 2009).
To break through limitations of the above two approaches, statistical methods were developed and applied to generating reliable LSM based on carefully collecting large quantities of geo-environmental data, statistical relations between historical landslides and conditioning factors in the given area were efficiently analyzed (Guzzetti et al., 2006, Rossi and Reichenbach, 2016, Althuwaynee et al., 2016). Reichenbach et al. (2018) made further classification on statistic-based models into 6 groups including classical statistics, index-based, machine learning, neural networks, multi criteria decision analysis, and other statistics. Among which classical statistics (logistic regression, linear regression etc.) and index-based such as weight of evidence were most frequently used. Various attempts were made on applying some new data mining with machine learning algorithms in mapping landslide susceptibility, such as artificial neural network (Yilmaz, 2009), support vector machine (Huang and Zhao, 2018), naïve bayes (Tsangaratos and Ilia, 2016), decision tree (Yeon et al., 2010, Zhang et al., 2017). Moreover, decision tree algorithm was also extended and developed into alternating decision tree (ADTree) (Pham et al., 2016a), classification and regression trees (CART) (Youssef et al., 2016), J48 decision tree (Tien Bui et al., 2014) and random forest (RF) (Stumpf and Kerle, 2011, Trigila et al., 2015) etc. have also been proposed in LSM research with high efficiency for classification and achieved promising predictive power. More importantly, ADTree as a boosting decision tree procedure can also produce very accurate classifiers and the classifiers are easier to interpret than some decision tree algorithms such as C4.5, CART, RF, and singled decision tree (Freund and Mason, 1999). It is feasible to explore its potential in landslides prediction in this study.
Due to the ability to enhance prediction accuracy of models, ensemble techniques gradually received researchers’ great attention in different field worldwide but were rarely used in landslides prediction (Lee and Oh, 2012, Tien Bui et al., 2014, Althuwaynee et al., 2014, Reichenbach et al., 2018). More and more hybrid and integration of different models were applied and evaluated in LSM study, each combination showed superior prediction performance with high reliability for LSM, however more ensemble-based approaches need to be further explored and applied (Tien Bui et al., 2014, Althuwaynee et al., 2014, Althuwaynee et al., 2016, Youssef et al., 2015, Pham et al., 2016b, Pham et al., 2017a, Pham and Prakash, 2017). It is recommended to combine the different model to reduce model errors and enhance the reliability of landslide susceptibility prediction (Rossi et al., 2010). AdaBoost (adaptive boosting) algorithm proposed by Freund and Schapire (1997) and Bagging (bootstrap aggregation) algorithm introduced by Breiman (1994) were two of the most popular and earliest techniques, however rarely applied in landslide susceptibility analyses, which need to be widely explored. Bagging ensemble based J48 decision tree, functional decision tree (FDT), naïve bayes, ADTree, and FR as well as AdaBoost ensemble based J48 and FDT were used and produced much preferable results than individual models (Tien Bui et al., 2014, Tien Bui et al., 2016b, Pham and Prakash, 2017, Pham et al., 2017b).
The Loess Plateau is of great importance to the Silk Road Economic Belt carried out by the Chinese government, disaster prevention and reduction as well as land resource utilization is essential to this giant economic development project. Loess is characterized by loose texture, vertical joints and fissures, macro pores, high sensitivity to water and suction stress, which makes it prone to slope failure when subjected to rainfall, anthropogenic activities, erosion, and seismic shock, the widely distributed loess landslides pose great threat and danger (Derbyshire, 2001, Zhang and Wang, 2007, Zhuang et al., 2018, Peng et al., 2019). Lei (2001) reported the landslides occurred from 1950s to 1992 in two provinces (Shannxi and Gansu) in the Loess Plateau, there were 16,616 landslides in northern Shannxi and 14,109 in Gansu, with the density of 5 and 6 landslides per square kilometer, respectively. Peng et al. (2014) further reported that 1131 landslides occurred in Shannxi and 4576 landslides in eastern Gansu since 2008. In this case, it is very necessary to carry out LSM on landslide prone areas to provide information on potential spatial occurrence of slope failures (Guzzetti, 2006). In recent years, some researchers have successfully conducted LSM on some counties and river basins in the Loess Plateau using different models involving heuristic, deterministic and statistic approaches (Chen et al., 2014, Chen et al., 2015, Chen et al., 2016a, Chen et al., 2016b, Chen et al., 2017a, Z. Chen et al., 2017, Wang et al., 2015, Wu et al., 2017, Wu and Ke, 2016). Among which, Frequency ratio and certainty factor were most frequently used, but limited models were applied, and study of ensemble techniques in this area has not been reported by previous literature. Longxian (Shaanxi, China), a county locates in the southwest region of the Loess Plateau, suffered a lot from the frequent loess landslides, which makes it a valuable region to conduct LSM research.
The current study aims to explore the potential application of the ADTree and its two ensembles, namely, Bagging and AdaBoost in LSM at Longxian County, and a comparison of their overall performance was made. The ADTree-AdaBoost, as a novel model, was first proposed in landslide prediction analysis. The application of the other two models remains poorly constrained and needs to be further explored. This paper provides new ideas and useful information for landslide related research, and the landslide susceptibility maps can help decision makers to better utilize land resources to achieve economic development as well as bring harmony between human being and the fragile loess environment.
Section snippets
Description of the study area
The study area (Longxian County) belongs to Shaanxi Province, is in the southwestern edge of the Loess Plateau and the northwest part of China (Fig. 1). Longxian County lies between longitudes of 106°26′32″ to 107°08′11″E and latitudes of 34°35′17″ to 35°06′45″N, covering an area of approximately 2418 km2 with 0.27 million people by 2016. This area is comprised of approximately 31.3% farmland, 18.0% bareland, 0.4% residential areas, 0.1% water bodies, and 50.2% forest and grass. The climate is
Data used
In the present study, mainly five basic datasets were collected, including historical landslide records, a digital elevation model (DEM) in a resolution of 30 m, Landsat 8 Operational Land Imager (OLI) images in a spatial resolution of 30 m, Google Earth satellite images, and a lithology map at a scale of 1:200,000. The historical landslide records were sourced from the Nuclear Industry Geological Survey in Shaanxi Province. DEM and Landsat 8 LOI images were obtained from Geospatial Data Cloud
Evaluation of landslide conditioning factors
It is necessary to evaluate the predictive capability of landslide conditioning factors to acquire more accurate landslide susceptibility modeling, because some factors may have a negative effect on the generated models. Moreover, an existing strong correlation between these factors is also bad for the model performance (Tien Bui et al., 2016b). Thus, in this study, an attribute evaluator called “CorrelationAttributeEval (correlation attribute evaluation)” was used to obtain the predictive
Landslides distribution pattern
The landslide inventory map and maps of the eleven landslide conditioning factors were used to acquire the landslide occurrence probability of each class. The results of frequency ratio (FR) values in each class are summarized in Table 1. As for altitude, it shows the highest FR value of 2.314 for the class of 1000–1200 m, and then the value decreases as the altitude increases. Additionally, over 92% landslides occurred in altitude less than 1200 m, which is the also the concentrated area of
Conclusions
In this research, the ADTree and its two ensembles, namely, Bagging and AdaBoost, were applied for LSM in Longxian County (China). Besides, the ADTree-AdaBoost, as a novel combination model, was applied in landslide susceptibility modeling for the first time. Moreover, the performance of the three models was systematically evaluated and analyzed to select the optimized model for the study area. In sum, the following conclusions can be obtained:
According to integrated analysis of landslide
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We are very thankful to Victor Jetten, editor of the CATENA journal and two anonymous reviewers for their valuable comments and suggestions to improve the quality of our paper. This study was supported by National Basic Research Program of China (973 Program, No. 2014CB744701), the National Natural Science Foundation of China (No. 41072213). The authors acknowledge the PhD scholarship awarded to Yutian Ke (No. 201706180008) and Haoyuan Hong (NO. 201906860029) by the China Scholarship Council.
References (101)
- et al.
GIS-based landslide susceptibility mapping using analytical hierarchy process (AHP) and certainty factor (CF) models for the Baozhong region of Baoji City, China
Environ. Earth Sci.
(2016) - et al.
Landslide susceptibility mapping based on GIS and support vector machine models for the Qianyang County, China
Environ. Earth Sci.
(2016) - et al.
A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility
Catena
(2017) - et al.
Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling
Catena
(2017) - et al.
Landslide spatial modeling: introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques
Geoderma
(2017) - et al.
GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method
Catena
(2018) - et al.
Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: a case of the Belice River basin (western Sicily, Italy)
Geomorphology
(2015) - et al.
Predictive modelling of rainfall-induced landslide hazard in the Lesser Himalaya of Nepal based on weights-of-evidence
Geomorphology
(2008) - et al.
Landslide risk assessment and management: an overview
Eng. Geol.
(2002) Geological hazards in loess terrain, with particular reference to the loess regions of China
Earth-Sci. Rev.
(2001)
A desicion-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
An examination of controls on debris flow mobility: evidence from coastal British Columbia
Geomorphology
Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy
Geomorphology
Estimating the quality of landslide susceptibility models
Geomorphology
Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines
Catena
Landslide susceptibility assessment in Lianhua County (China): a comparison between a random forest data mining technique and bivariate and multivariate statistical models
Geomorphology
Review on landslide susceptibility mapping using support vector machines
Catena
Presenting logistic regression-based landslide susceptibility results
Eng. Geol.
Influence of spatial heterogeneity and temporal variability in habitat selection: a case study on a great bustard metapopulation
Ecol. Model.
Validation of venous clinical severity score (VCSS) with other venous severity assessment tools from the American venous forum, national venous screening program
J. Vasc. Surg.
Heavy rainfall triggered loess–mudstone landslide and subsequent debris flow in Tianshui, China
Eng. Geol.
Distribution and genetic types of loess landslides in China
J. Asian Earth Sci.
Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS
Catena
Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran
Catena
Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling
Environ. Modell. Softw.
PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches
Sci. Total Environ.
A review of statistically-based landslide susceptibility models
Earth-Sci. Rev.
Optimal landslide susceptibility zonation based on multiple forecasts
Geomorphology
Landslide susceptibility mapping at central Zab basin, Iran: a comparison between analytical hierarchy process, frequency ratio and logistic regression models
Catena
Erosion processes in steep terrain—truths, myths, and uncertainties related to forest management in Southeast Asia
For. Ecol. Manage.
Object-oriented mapping of landslides using Random Forests
Remote Sens. Environ.
Landslide susceptibility assessment by bivariate methods at large scales: application to a complex mountainous environment
Geomorphology
Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)
Geomorphology
Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size
Catena
GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations
Catena
Landslide susceptibility mapping in Injae, Korea, using a decision tree
Eng. Geol.
Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat—Turkey)
Comput. Geosci.
Study of the 1920 Haiyuan earthquake-induced landslides in loess (China)
Eng. Geol.
A rapid loess flowslide triggered by irrigation in China
Landslides
Landslide susceptibility mapping for a landslide-prone area (Findikli, NE of Turkey) by likelihood-frequency ratio and weighted linear combination models
Environ. Geol.
Understanding diagnostic tests 3: receiver operating characteristic curves
Acta Paediatr.
A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping
Landslides
A novel integrated model for assessing landslide susceptibility mapping using CHAID and AHP pair-wise comparison
Int. J. Remote Sens.
Landslide susceptibility mapping using GIS-based weighted linear combination, the case in Tsugawa area of Agano River, Niigata Prefecture, Japan
Landslides
Bagging predictors
Machine learning
Spatial prediction models for landslide hazards: review, comparison and evaluation
Nat. Hazards Earth Syst. Sci.
GIS techniques and statistical models in evaluating landslide hazard
Earth Surf. Proc. Land
Application of weights-of-evidence model in landslide susceptibility mapping at Baozhong Region in Baoji, China
Environ. Geol.
Landslide susceptibility mapping based on GIS and information value model for the Chencang District of Baoji, China
Arab. J. Geosci.
Application of frequency ratio, statistical index, and index of entropy models and their comparison in landslide susceptibility mapping for the Baozhong Region of Baoji, China
Arab. J. Geosci.
Cited by (246)
A dynamic feature selection-based data-driven quality prediction method for soft sensing in the diesel engine assembly system
2024, Advanced Engineering InformaticsBioinformatics and machine learning driven key genes screening for hepatocellular carcinoma
2024, Biochemistry and Biophysics Reports