Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance

doi:10.1016/j.earscirev.2020.103225

Earth-Science Reviews

Volume 207, August 2020, 103225

https://doi.org/10.1016/j.earscirev.2020.103225 Get rights and content

Abstract

Landslides are one of the catastrophic natural hazards that occur in mountainous areas, leading to loss of life, damage to properties, and economic disruption. Landslide susceptibility models prepared in a Geographic Information System (GIS) integrated environment can be key for formulating disaster prevention measures and mitigating future risk. The accuracy and precision of susceptibility models is evolving rapidly from opinion-driven models and statistical learning toward increased use of machine learning techniques. Critical reviews on opinion-driven models and statistical learning in landslide susceptibility mapping have been published, but an overview of current machine learning models for landslide susceptibility studies, including background information on their operation, implementation, and performance is currently lacking. Here, we present an overview of the most popular machine learning techniques available for landslide susceptibility studies. We find that only a handful of researchers use machine learning techniques in landslide susceptibility mapping studies. Therefore, we present the architecture of various Machine Learning (ML) algorithms in plain language, so as to be understandable to a broad range of geoscientists. Furthermore, a comprehensive study comparing the performance of various ML algorithms is absent from the current literature, making an assessment of comparative performance and predictive capabilities difficult. We therefore undertake an extensive analysis and comparison between different ML techniques using a case study from Algeria. We summarize and discuss the algorithm's accuracies, advantages and limitations using a range of evaluation criteria. We note that tree-based ensemble algorithms achieve excellent results compared to other machine learning algorithms and that the Random Forest algorithm offers robust performance for accurate landslide susceptibility mapping with only a small number of adjustments required before training the model.

Introduction

Landslides are a cascading geo-hazard that can have significant impacts on human lives and settlements worldwide, and are a driving force in landscape evolution (Fan et al., 2019). In recent years, the socioeconomic impacts of landslides have been exacerbated through global economic expansion, unplanned developmental activities, and aggravated climate change (Guzzetti et al., 2012; Guzzetti et al., 1999; Li et al., 2020a).

The processes that modulate the spatial and temporal occurrence of landslides include strong earthquakes, heavy precipitation, snowmelt, land-use changes, and other anthropogenic activities (Chen et al., 2020a; Dai et al., 2002; Dou et al., 2015c; Jaboyedoff et al., 2012; Kawagoe et al., 2010). Landslide hazards and their associated risk have been studied in detail, due to their destructive nature and socioeconomic impacts. In areas with considerable risk from landslides, susceptibility maps are a fundamental step toward hazard assessment and mitigation strategies (Dou et al., 2019c; Li et al., 2019 Van Westen et al., 2006). Such procedures are typically followed in landslide assessments and mitigation at regional or catchment scale. The use of a Geographic Information System (GIS) environment in landslide susceptibility map preparation is an effective method to identify and delineate landslide-prone areas in order to create a geospatial database of landslide occurrence, or ‘landslide inventory’. Using GIS data sources, geospatial properties of the landslide locations that may affect potential slope stability, known as Landslide Conditioning Factor (LCF), can be compiled into a database (e.g., slope angle, slope aspect, soil types, rainfall, topographic wetness and lithology type, etc.). The LCF data can then be used to model the responses of other slopes in the study area in an attempt to predict future landslide occurrence.

Different terminologies have been applied to this method of mapping and modeling over time. The term ‘landslide susceptibility mapping’ has been used in earlier studies to specifically refer to the process of identifying and mapping sites of historic landslides. More recently, it has included trying to predict locations of future events through modeling approaches, an approach referred to as ‘landslide susceptibility modeling’. In this paper, we use the acronym LSM to refer to the entire processing of mapping and modeling the susceptibility of slopes to future landslides. LSM is a key part of disaster management strategies, as it produces a map of probabilities of landslide occurrence in a geographical region. According to Brabb (1984), landslide susceptibility is defined as “the likelihood of a landslide occurring in a given area” based on the given topographical and environmental variables (i.e., LCF), and a LSM approach identifies areas in which landslides are likely to occur (Guzzetti et al., 1999).

Decision-makers and local agencies use LSM probability evaluation in order to partition the geographic surface into zones with different degrees of stability and instability. This process, known as ‘landslide risk zoning’ plays a significant role in the control, management, and counter-measures for mitigating the risks associated with known and potential future landslides. Using GIS in this approach can provide improvements in handling spatial data, provide improved processing capabilities and aid in the decision-making process.

Despite different statistical approaches, terminologies, and computation capability, LSM primarily aims at highlighting the spatial distribution of landslides based upon the following assumptions; (i) the past is the key to the future, implying that future events will likely happens in similar conditions to those that happened in the past; and (ii) LCF affecting landslide occurrence are spatially linked and therefore can be used in predictive functions (Reichenbach et al., 2018). These predictive functions can therefore be implemented through the compilation of a landslide inventory and associated geospatial LCF data.

Consequently, different quantitative techniques and approaches have been developed for LSM. Broadly, there are four main types of LSM approach: physical-based models, opinion-driven (i.e. heuristic) models, statistical models and more recently, machine learning (ML) models (Chang et al., 2019; Nguyen et al., 2019; Pham et al., 2019; Tien Bui et al., 2019; Li et al., 2019). Each of these individual approaches has been shown to have its own advantages and limitations (Bergstra et al., 2013; Khosravi et al., 2019). For instance, physical-based models involving detailed site characterization, currently deliver the highest prediction accuracy, and are suited for local-area (i.e., sub-catchment) scale mapping and analysis. Such models require a detailed understanding of the landslide system derived from local surface and subsurface observations and monitoring systems, and are typically employed to provide early warning of impending slope failure (Piciullo et al., 2018; Whiteley et al., 2019). However, for large-scale analysis (i.e., watershed/basin scale through to county/provincial/country level), physical-based models require large amounts of detailed data to provide reliable results, which comes with excessive financial and computational cost. Therefore, physical-based models are currently not practical for large area risk zonation exercises. For this reason, knowledge-based models and statistical models, which are modulated by limited information on terrain and environmental variables, have dominated the arena of LSM over the past 40 years (Guzzetti et al., 2012). Opinion-driven models are based on structuring a model based on limited information, and afterward parameterizing it by ranking and/or weighting the landslide conditioning factors based on expert opinion and expertise. This approach can be problematic as it can be hard to quantify or evaluate a result objectively. Statistical models, on the other hand, benefit the most from the advancements in GIS in the last decade, and consequently a plethora of quantitative methods and techniques have been proposed and implemented successfully for modeling landslides that aid in understanding landslide patterns and their triggering mechanisms (Dou et al., 2019b; Thai et al., 2016). Since the early days of statistical predictive modeling, the progression in understanding landslide susceptibility has been astonishingly rapid. In the last two decades, many different landslide susceptibility models emerging from various statistical approaches have been employed in the ML environment to obtain accurate risk zonation maps.

The margin between statistical models and ML is a subject of debate (Bzdok et al., 2018). The association and differences between statistical and ML modeling approaches are not well explained in the landslide susceptibility literature, primarily because producing and delivering accurate LSM results is a higher priority for geoscientists and geo-researchers than defining and classifying algorithms. By definition, ML learns from data without banking on rules-based functions, whereas statistical modeling streamlines relationships between variables in the data by means of mathematical equations. Although in the past the two fields were considered exclusive (Fig. 1a), they have converged in recent times (Fig. 1b). A case in point is the use of Logistic Regression (LR) algorithms, initially a statistical model for solving binary classification problems. LR was borrowed by ML from the field of statistical models and is currently one of the most widely used ML algorithms. Similarly, Bootstrap (Kulesa et al., 2015) is a method used in statistical inference, but is also applied regularly in Random Forest (RF) algorithms. Nevertheless, ML emphasizes optimization and performance rather than the inference, which is the primary concern of statistical models. The nascence of ML in LSM means there are few instructive resources for better understanding aimed at those who are not experts in ML, and so it is indispensable and timely to prepare an overview of the different ML algorithms available and provide a comparison between the available learning algorithms for LSM.

In the literature, several studies have implemented different ML algorithms for LSM (Camilo et al., 2017; Pham et al., 2019; Tien Bui et al., 2019). Machine learning has flourished in other fields of science since the 1990s, with major developments including the implementation of neural networks, development of boosting algorithms, and increased accessibility to internet-derived and digital data. Consequently, ML was first used in the field of landslides in the early to mid-2000s (Fig. 2). Logistic regression (LR) and Artificial Neural Network (NNET) algorithms were the earliest ML methods applied to LSM and have a total article count of 1587 and 746 respectively since 2000. In the search for more accurate LSM products, more recently researchers have used highly sophisticated algorithms such as Support Vector Machine (SVM), Decision Tree (DT) and Random Forest (RF) algorithms, with their popularity increasing from 2010 onward. There have been 342 publications using SVM, 247 using DT and 169 publications using RF techniques since 2000. Other ML algorithms are rarely applied in LSM (Fig. 2) for two likely reasons; (i) SVM, DT, and RF attain over 90% prediction accuracy which can currently be seen as a realistic upper limit in LSM modeling (Chang et al., 2019; Dou et al., 2020a); (ii) other ML models have been developed more recently, and have increased complexities in their applications which require advanced knowledge of ML processing to implement successfully.

The literature surrounding specific aspects of landslide hazard and landslide susceptibility has grown over the past three decades. Several benchmark publications presenting case studies, models, and reviews on the susceptibility mapping and modeling process have been identified. Among them, the 15 top-cited publications are listed in Fig. 3, with most of this literature published before 2010. Noticeably, only a handful of researchers are involved in investigating the complexity of susceptibility models using ML models. A few studies have reviewed progress in the wider area of modeling (e.g., Budimir et al., 2015; Reichenbach et al., 2018; Rossi et al., 2010). Based on the Web of Science (WoS) database, we identify ten authors whose contribution have the most substantial proportion of all published literature on ML for LSM studies (Fig. 4). These ten authors are responsible for approximately 30% of published LR studies, 47% of published NNET studies, 70% of published RF studies, 83% of published DT studies, and 86% of published SVM studies. Although SVM, RF, and DT are more recent additions to the range of ML models available for LSM, the article share of these researchers is significant. Among the top five authors, four of them are affiliated to institutes in Malaysia, Norway, Iran, and Vietnam, underscoring the sizeable population of articles from these nations.

Furthermore, Fig. 5 shows publications using ML in LSM by country, and displays the countries with most publications. In the WoS database, ‘country’ refers to the location of the author's affiliation, rather than the location of LSM studies. Although it is not always the case, the author affiliation often reflects the study area. For example, China tops the list with maximum publication in all kinds of ML for LSM, and also has one of the highest incidence of landslide occurrence in the world (Kirschbaum et al., 2010). On the other hand, the Netherlands, where one-third of the land lies below sea level, does not experience significant risk from landslide hazards, but is listed 16th in the Fig. 5. This can be attributed to the research outcomes from graduates and researchers at the International Institute for Geo-Information Science and Earth Observation (ITC), University of Twente, a leading research institute in GIS applications for natural hazards. Nepal, which topped the list of countries with the highest percentage of landslide reports by Kirschbaum et al., (2010), is not found in the top 18 countries using ML for LSM studies. This suggests an absence of ML researcher affiliation within the country.

The top journals and their percentage share of publications using ML for LSM are shown in Fig. 6. Journals such as Environmental Earth Sciences (EES), Geomorphology (GEM), Landslides (LAN), Catena (CAT), and Geomatics Natural Hazard Risks (GNH) are a common choice for studies using ML for LSM. The next most popular choices are Engineering Geology (ENG) and Natural Hazards (NAH). Studies using advanced ML techniques such as SVM, DT, and RF are found in the journals Remote Sensing (REM) and Science of Total Environment (SCT). Regional works on LSM using earlier ML techniques such as LR and NNET are also popular in Arabian Journal of Geosciences (ARA) and Natural Hazards and Earth System Sciences (NHE).

Overall, 42% of the articles using LR techniques for LSM in the WoS database were published in the ten journals listed in Fig. 6a. In addition, 49% of publications using NNET and/or LSM methods (Fig. 6b), 48% using SVM and/or LSM (Fig. 6c); 47% using DT and/or LSM (Fig. 6d); and 49% using RF techniques (Fig. 6e) were published in these journals, all of which are relevant to the study of natural hazards and geomorphology. Nevertheless, no comprehensive reviews have been undertaken focusing exclusively on the use of ML in LSM in order to present the complexities, comparisons, challenges, and opportunities for the future. Hence, this review builds upon the aforementioned body of literature (Fig. 3).

The concepts and terminology surrounding ML and its applications in LSM can be unfamiliar to geoscientists and geomorphologists without computing and statistical backgrounds, and therefore, Section 2 is devoted to detailing the architecture of the most popular ML algorithms used for landslide susceptibility studies. The ML algorithms presented include Logistic Regression, Artificial Neural Network, Support Vector Machine, Decision Tree, Random Forest, Naïve Bayes, Quadratic Discriminant Analysis, K-Nearest Neighbors, and Gradient Boosting algorithms.

To date, no consensus about which ML algorithm is the ‘best’ suited for predicting landslide-prone areas has been identified (Dou et al., 2020a; Y. Li et al., 2020b; Sevgen et al., 2019). It has been postulated in many studies that the prediction accuracy of landslide modeling is influenced by not only the quality of data behind landslide inventories and landslide conditioning factors but also the fundamental quality of the ML algorithm used (Nhu et al., 2020; Yilmaz, 2009). Therefore, Section 3 assesses and compares the prediction capabilities of different ML algorithms for LSM approaches by considering a case study from Algeria.

When advanced ML techniques are used, prediction results can attain accuracies in excess of 90% (e.g., Dou et al., 2019a). However, researchers are still aiming to develop and apply additional models to produce more accurate outputs. Section 4 focuses on discussing the performance of the ML models and presents the challenges, limitations and future opportunities for using ML methods in LSM. The concluding remarks from this review are presented in Section 5.

Section snippets

Machine learning model architecture

Machine learning techniques have proven to be a standard solution for addressing big-data spatial analytics where the extent of the theoretical knowledge of a problem is incomplete (Lary et al., 2016) and when statistical pre-assumptions are unreliable or not known (Dou et al., 2019a). Due to these factors, and combined with their robustness as one of the ideal techniques for solving non-linear geo-environmental issues, ML techniques are increasingly used in LSM. Using either regression or

Comparative analysis of Machine Learning Models in Landslide Susceptibility Studies

Wolpert (1996) introduced the ‘No free lunch’ (NFL) concept, which was summarized as “any two algorithms are equivalent when their performance is averaged across all possible problems”. The NFL concept applies to the current state of ML modeling in general and spatial prediction of landslides in particular as “no single or particular model can be depicted as the most suitable for all case scenarios”. This is because of the difficulty in assessing whether an implemented ML model provides a

Discussion, challenges, and future directions

Regional landslide susceptibility mapping is a hot topic, due to the constant risks posed in many parts of the world. It is a critical step in the prediction and mitigation of future landslide occurrence, but requires substantial resources and can be difficult to implement due to the non-linear characteristics of LSM datasets. Although various methodologies for producing landslide susceptibility maps have been developed, the prediction accuracy of these methods is still debated (Su et al., 2017

Concluding remarks

In this article, we have provided a summary of machine learning models used for landslide susceptibility modeling, including identifying the recent trends in the use of ML methods for LSM, and presenting the basic architecture of the most popular ML methods. Subsequently, we formulated a comprehensive framework for comparing and assessing machine learning models to identify areas susceptible to the occurrence of landslides. This was achieved by systematically passing different landslide

Glossary

Bayes' theorem - Also ‘Bayes’ law’ or ‘Bayes' rule’, describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Bagging – See Bootstrap.

Black-box model – A common metaphor used in computer programming referring to a system for which we can only observe the inputs and outputs, but not the internal workings (see also White-box models).

Bootstrap - The bootstrap (‘bootstrap aggregating’ or simply ‘bagging’) method is a resampling technique used

Author contributions

DJ, AB, and YAP were responsible for coordinating with all co-authors. AB performed the analysis with contribution from DJ and YAP. AB, YAP, and DJ generating most of the figures, with input from all authors. AB, DJ, YA, JW, BTP, DTB, RA, BA contributed to writing and provided helpful discussions.

Declaration of Competing Interest

No conflict of interest exists.

Acknowledgments

This research is supported by the National Natural Science Fundation of China (No.41827808) and open fund (SKHL1903) from State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, JSPS Program, and CAS Pioneer Hundred Talents Program. Authors sincerely thank the Editor Shuhab Khan, and the two reviewers for their constructive and detailed comments. Jim Whiteley publishes with the permission of the Executive Director, British Geological Survey (UKRI-NERC)

References (151)

C. Aldrich et al.
The application of neural nets in the metallurgical industry
Miner. Eng.
(1994)
P.M. Atkinson et al.
Generalised linear modelling of susceptibility to LANDSLIDING in the central APENNINES, Italy
Comput. Geosci.
(1998)
L. Ayalew et al.
The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains
Central Japan
(2005)
J.J. Buckley et al.
Fuzzy neural networks: a survey
Fuzzy Sets Syst.
(1994)
M.J. Cracknell et al.
Geological mapping using remote sensing data: a comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information
Comput. Geosci.
(2014)
F. Dai et al.
Landslide risk assessment and management: an overview
Eng. Geol.
(2002)
J. Dou et al.
Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan
Sci. Total Environ.
(2019)
J. Dou et al.
Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning
Sci. Total Environ.
(2020)
Y. Freund et al.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
J. Comput. Syst. Sci.
(1997)
J.N. Goetz et al.
Computers & Geosciences Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling
Comput. Geosci.
(2015)

Q. Guo et al.

Support vector machines for predicting distribution of Sudden Oak Death in California

Ecol. Model.

(2005)

F. Guzzetti et al.

Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy

Geomorphology

(1999)

F. Guzzetti et al.

Landslide inventory maps: New tools for an old problem

Earth-Science Rev.

(2012)

C. Li et al.

Susceptibility of reservoir-induced landslides and strategies for increasing the slope stability in the Three Gorges Reservoir Area: Zigui Basin as an example

Eng. Geol.

(2019)

I. Kaastra et al.

Designing a neural network for forecasting financial and economic time series

Neurocomputing

(1996)

K. Khosravi et al.

A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran

Sci. Total Environ.

(2018)

K. Khosravi et al.

A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods

J. Hydrol.

(2019)

D.J. Lary et al.

Machine learning in geosciences and remote sensing

Geosci. Front.

(2016)

M.F. Møller

A scaled conjugate gradient algorithm for fast supervised learning

Neural Netw.

(1993)

E. Alpaydin

Introduction to Machine Learning

(2009)

C. Ballabio et al.

Support Vector Machines for Landslide Susceptibility Mapping: the Staffora River Basin Case Study, Italy

Math. Geosci.

(2012)

D. Basak et al.

Support vector regression

Neural Inf. Process. Rev.

(2007)

J. Bergstra et al.

Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures

E.E. Brabb

Innovative approaches to landslide hazard mapping

L. Breiman

Random forests

Mach. Learn.

(2001)

L. Breiman et al.

Random Forests: Finding Quasars

J. Bröcker et al.

Increasing the Reliability of Reliability Diagrams

Weather Forecast.

(2007)

M.E.A. Budimir et al.

A systematic review of landslide probability mapping using logistic regression

Landslides.

(2015)

D. Bzdok et al.

Statistics versus machine learning

Nat. Publ. Gr.

(2018)

D.C. Camilo et al.

Handling high predictor dimensionality in slope-unit-based landslide susceptibility models through LASSO-penalized Generalized Linear Model

Environ. Model. Softw.

(2017)

F. Catani et al.

Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues

Nat. Hazards Earth Syst. Sci.

(2013)

K.-T. Chang et al.

Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques

Sci. Rep.

(2019)

W. Chen et al.

GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models

Geomat. Nat. Haz. Risk.

(2017)

Y. Chen et al.

Relationship between water content, shear deformation, and elastic wave velocity through unsaturated soil slope

Bull. Eng. Geol. Environ.

(2020)

J.S. Chen et al.

A kNN based position prediction method for SNS places

V. Cherkassky et al.

Selection of meta-parameters for support vector regression

W. Chettah et al.

Investigation des propriétés minéralogiques et géomécaniques des terrains en mouvement dans la ville de Mila «Nord-Est d'Algérie». Sci. la terre l'univers

(2009)

C.-M. Chu et al.

Integrating Decision Tree and Spatial Cluster Analysis for Landslide Susceptibility Zonation

World Acad. Sci. Eng. Technol.

(2009)

M. Clerc et al.

The particle swarm - explosion, stability, and convergence in a multidimensional complex space

IEEE Trans. Evol. Comput.

(2002)

A. Cobham

The intrinsic computational difficulty of functions

P.-E. Coiffait

Un bassin post-nappes dans son cadre structural: l'exemple du bassin de Constantine (Algérie Nord-Orientale)

(1992)

C. Cortes et al.

Support-vector networks

Mach. Learn.

(1995)

N. Cristianini et al.

Support Vector Machines and Kernel Methods: The New Generation of Learning Machines

Artif. Intell. Mag.

(2002)

C.F. Dormann et al.

Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

Ecography (Cop.).

(2012)

J. Dou et al.

Optimization of causative factors for landslide susceptibility evaluation using remote sensing and GIS data in parts of Niigata, Japan

PLoS One

(2015)

J. Dou et al.

Shallow and Deep-Seated Landslide Differentiation using support Vector Machines: a Case Study of the Chuetsu Area, Japan

Terr. Atmos. Ocean. Sci.

(2015)

J. Dou et al.

An integrated artificial neural network model for the landslide susceptibility assessment of Osado Island, Japan

Nat. Hazards

(2015)

J. Dou et al.

Evaluating GIS-Based Multiple Statistical Models and Data Mining for Earthquake and Rainfall-Induced Landslide Susceptibility Using the LiDAR DEM

Remote Sens.

(2019)

J. Dou et al.

Torrential rainfall-triggered shallow landslide characteristics and susceptibility assessment using ensemble data-driven models in the Dongjiang Reservoir Watershed, China

Nat. Hazards

(2019)

J. Dou et al.

Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan

Landslides

(2020)

Cited by (518)

Feature adaptation for landslide susceptibility assessment in “no sample” areas
2024, Gondwana Research
Given the time-consuming nature of compiling landslide inventories, it is increasingly important to develop transferable landslide susceptibility models that can be applied to regions without existing data. In this study, we propose a feature-based domain adaptation method to improve the transferability of landslide susceptibility models, especially in “no sample” areas. Two typical landslide-prone areas in Fujian province, southeastern China, were chosen as research cases to test the practicality of the transfer effect. Five conventional machine learning algorithms (Support vector machines (SVM), Random Forest (RF), Logistic Regression (LOG), K-nearest neighbor (KNN), and Decision tree (C4.5)) are used to model landslide susceptibility in sampled areas (source domain), and a feature transfer-based landslide susceptibility evaluation model is constructed under coupled feature transfer methods to evaluate the susceptibility of landslide in un-sampled areas (target domain). The results showed that feature transfer can effectively improve the transferability of different machine learning models for cross-regional prediction (The indicators have improved overall by 8.49%), with SVM (increased by 13.68%) and LOG (increased by 10.19%) models showing the most significant improvements. The feature-based domain adaptive method can alleviate the burden of collecting and labeling new data, and effectively improve the assessment performance of machine learning-based landslide susceptibility models in un-sampled areas. This is a new solution for landslide susceptibility assessment in completely “no sample” areas.
Enhancing landslide susceptibility mapping incorporating landslide typology via stacking ensemble machine learning in Three Gorges Reservoir, China
2024, Geoscience Frontiers
Different types of landslides exhibit distinct relationships with environmental conditioning factors. Therefore, in regions where multiple types of landslides coexist, it is required to separate landslide types for landslide susceptibility mapping (LSM). In this paper, a landslide-prone area located in Chongqing Province within the middle and upper reaches of the Three Gorges Reservoir area (TGRA), China, was selected as the study area. 733 landslides were classified into three types: reservoir-affected landslides, non-reservoir-affected landslides, and rockfalls. Four landslide inventory datasets and 15 landslide conditional factors were trained by three Machine Learning models (logistic regression, random forest, support vector machine), and a Deep Learning (DL) model. After comparing the models using receiver operating characteristics (ROC), the landslide susceptibility indexes of three types landslides were acquired by the best performing model. These indexes were then used as input to generate the final map based on the Stacking method. The results revealed that DL model showed the best performance in LSM without considering landslide types, achieving an area under the curve (AUC) of 0.854 for testing and 0.922 for training. Moreover, when we separated the landslide types for LSM, the AUC improved by 0.026 for testing and 0.044 for training. Thus, this paper demonstrates that considering different landslide types in LSM can significantly improve the quality of landslide susceptibility maps. These maps in turn, can be valuable tools for evaluating and mitigating landslide hazards.
On the use of explainable AI for susceptibility modeling: Examining the spatial pattern of SHAP values
2024, Geoscience Frontiers
Hydro-morphological processes (HMP, any natural phenomenon contained within the spectrum defined between debris flows and flash floods) are globally occurring natural hazards which pose great threats to our society, leading to fatalities and economical losses. For this reason, understanding the dynamics behind HMPs is needed to aid in hazard and risk assessment. In this work, we take advantage of an explainable deep learning model to extract global and local interpretations of the HMP occurrences across the whole Chinese territory. We use a deep neural network architecture and interpret the model results through the spatial pattern of SHAP values. In doing so, we can understand the model prediction on a hierarchical basis, looking at how the predictor set controls the overall susceptibility as well as doing the same at the level of the single mapping unit. Our model accurately predicts HMP occurrences with AUC values measured in a ten-fold cross-validation ranging between 0.83 and 0.86. This level of predictive performance attests for an excellent prediction skill. The main difference with respect to traditional statistical tools is that the latter usually lead to a clear interpretation at the expense of high performance, which is otherwise reached via machine/deep learning solutions, though at the expense of interpretation. The recent development of explainable AI is the key to combine both strengths. In this work, we explore this combination in the context of HMP susceptibility modeling. Specifically, we demonstrate the extent to which one can enter a new level of data-driven interpretation, supporting the decision-making process behind disaster risk mitigation and prevention actions.
Landslide susceptibility mapping based on the reliability of landslide and non-landslide sample
2024, Expert Systems with Applications
Spatial data sampling can improve the performance in geo-spatial prediction. However, measuring the reliability of polygon-based data in sampling process is still a challenge. In this study, a reliability-based sampling (RBS) method was proposed to deal with this question and it was applied in landslide susceptibility mapping. First, the prototype of landslide was extracted from landslide polygon data, then, the reliability of landslide samples and non-landslide samples is measured using the similarity in environmental factor between the candidate samples and the prototype. The mutual exclusion reliability threshold setting method is used to collect the landslide samples and non-landslide samples with reliability over certain threshold. A case study demonstrates that the RBS method is better than existing representative method (i.e. Landslide entity) in terms of Accuracy and AUC with different sample sizes. In summary, The RBS is an efficient method to improve the spatial pattern of samples can also be applied to in other geo-spatial predictions.
Gully erosion mapping susceptibility in a Mediterranean environment: A hybrid decision-making model
2024, International Soil and Water Conservation Research
Gully erosion is one of the main natural hazards, especially in arid and semi-arid regions, destroying ecosystem service and human well-being. Thus, gully erosion susceptibility maps (GESM) are urgently needed for identifying priority areas on which appropriate measurements should be considered. Here, we proposed four new hybrid Machine learning models, namely weight of evidence -Multilayer Perceptron (MLP- WoE), weight of evidence –K Nearest neighbours (KNN- WoE), weight of evidence - Logistic regression (LR- WoE), and weight of evidence - Random Forest (RF- WoE), for mapping gully erosion exploring the opportunities of GIS tools and Remote sensing techniques in the El Ouaar watershed located in the Souss plain in Morocco. Inputs of the developed models are composed of the dependent (i.e., gully erosion points) and a set of independent variables. In this study, a total of 314 gully erosion points were randomly split into 70% for the training stage (220 gullies) and 30% for the validation stage (94 gullies) sets were identified in the study area. 12 conditioning variables including elevation, slope, plane curvature, rainfall, distance to road, distance to stream, distance to fault, TWI, lithology, NDVI, and LU/LC were used based on their importance for gully erosion susceptibility mapping. We evaluate the performance of the above models based on the following statistical metrics: Accuracy, precision, and Area under curve (AUC) values of receiver operating characteristics (ROC). The results indicate the RF- WoE model showed good accuracy with (AUC = 0.8), followed by KNN-WoE (AUC = 0.796), then MLP-WoE (AUC = 0.729) and LR-WoE (AUC = 0.655), respectively. Gully erosion susceptibility maps provide information and valuable tool for decision-makers and planners to identify areas where urgent and appropriate interventions should be applied.
Regional early warning model for rainfall induced landslide based on slope unit in Chongqing, China
2024, Engineering Geology
Recent advances in the diversity and systematization of design methods and real-time data have led to a general elevation in spatio-temporal accuracy for regional landslide early-warning (LEW). However, the heterogeneity of the geo-environment and the differences in landslide mechanisms are always neglected in the LEW models, which reduce the precision of LEW systems. This study proposes a slope-unit (SU) based regional LEW model for forecasting the real-time probability of rainfall-induced landslides, combing landslide susceptibility assessment and rainfall threshold modeling, taking Chongqing, China as the study case. The SU is adopted to discretize the study area concerning the concurrent occurrence of rainfall-induced shallow and deep-seated landslides, in view of the limitations of grid cells, which are more appropriate for shallow landslides with homogeneous materials and structures. In addition, four distinct subregions are identified based on the geo-environmental heterogeneity of the study area. For each subregion, specific landslide susceptibility models and rainfall thresholds are developed to account for the different landslide mechanisms. Landslide susceptibility maps (LSM) integrate data-driven methods with the latest 1:50,000 field surveys to achieve accurate predictions of future landslides. Rainfall threshold models are constructed based on a correlation analysis of 2142 landslides and associated historical rainfall events. By using 9-day antecedent rainfall records from 2103 rain gauges and numerical rainfall forecast products for the next 24 h as input data, the LEW model can dynamically release warning information. To validate the performance of the LEW model, the consecutive daily warnings for two rainfall events that induced groups of landslides were retrieved. The results demonstrated an overall satisfactory warning effect, with over 70% of the total rainfall-induced landslides exceeding the yellow alert warning level and a low rate of miss-alarms (<15%). It indicated that the slope unit partition based on the characteristics of rainfall-induced landslides and region division according to geological heterogeneity could effectively contribute to accurate LEW, especially over large areas. Furthermore, the findings revealed that early warnings of landslides induced by persistent rainfall over large area are more prone to generate false or miss alarms compared to local concentrated rainstorms. The LEW framework proposed in this study is expected to provide valuable technical support to the local authorities in effective landslide risk mitigation in a time-efficient manner.

View all citing articles on Scopus

¹: These authors contributed equally.

View full text

Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance

Abstract

Introduction

Section snippets

Machine learning model architecture

Comparative analysis of Machine Learning Models in Landslide Susceptibility Studies

Discussion, challenges, and future directions

Concluding remarks

Glossary

Author contributions

Declaration of Competing Interest

Acknowledgments

Miner. Eng.

Comput. Geosci.

Central Japan

Fuzzy Sets Syst.

Comput. Geosci.

Eng. Geol.

Sci. Total Environ.

Sci. Total Environ.

J. Comput. Syst. Sci.

Comput. Geosci.

Ecol. Model.

Geomorphology

Earth-Science Rev.

Eng. Geol.

Neurocomputing

Sci. Total Environ.

J. Hydrol.

Geosci. Front.

Neural Netw.

Introduction to Machine Learning

Support Vector Machines for Landslide Susceptibility Mapping: the Staffora River Basin Case Study, Italy

Math. Geosci.

Support vector regression

Neural Inf. Process. Rev.

Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures

Innovative approaches to landslide hazard mapping

Random forests

Mach. Learn.

Random Forests: Finding Quasars

Increasing the Reliability of Reliability Diagrams

Weather Forecast.

A systematic review of landslide probability mapping using logistic regression

Landslides.

Statistics versus machine learning

Nat. Publ. Gr.

Handling high predictor dimensionality in slope-unit-based landslide susceptibility models through LASSO-penalized Generalized Linear Model

Environ. Model. Softw.

Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues

Nat. Hazards Earth Syst. Sci.

Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques

Sci. Rep.

GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models

Geomat. Nat. Haz. Risk.

Relationship between water content, shear deformation, and elastic wave velocity through unsaturated soil slope

Bull. Eng. Geol. Environ.

A kNN based position prediction method for SNS places

Selection of meta-parameters for support vector regression

Investigation des propriétés minéralogiques et géomécaniques des terrains en mouvement dans la ville de Mila «Nord-Est d'Algérie». Sci. la terre l'univers

Integrating Decision Tree and Spatial Cluster Analysis for Landslide Susceptibility Zonation

World Acad. Sci. Eng. Technol.

The particle swarm - explosion, stability, and convergence in a multidimensional complex space

IEEE Trans. Evol. Comput.

The intrinsic computational difficulty of functions

Un bassin post-nappes dans son cadre structural: l'exemple du bassin de Constantine (Algérie Nord-Orientale)

Support-vector networks

Mach. Learn.

Support Vector Machines and Kernel Methods: The New Generation of Learning Machines

Artif. Intell. Mag.

Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

Ecography (Cop.).

Optimization of causative factors for landslide susceptibility evaluation using remote sensing and GIS data in parts of Niigata, Japan

PLoS One

Shallow and Deep-Seated Landslide Differentiation using support Vector Machines: a Case Study of the Chuetsu Area, Japan

Terr. Atmos. Ocean. Sci.

An integrated artificial neural network model for the landslide susceptibility assessment of Osado Island, Japan

Nat. Hazards

Evaluating GIS-Based Multiple Statistical Models and Data Mining for Earthquake and Rainfall-Induced Landslide Susceptibility Using the LiDAR DEM

Remote Sens.