Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest
Graphical abstract
Introduction
Landslides, a type of geological hazard, frequently occur around the world, leading to severe destructive consequences (Lombardo and Mai 2018). According to the Emergency Events Database (EM-DAT) for 2014–2018, landslides resulted in 4914 deaths, led to 27,110 people becoming homeless, and caused economic losses of $2.1 billion (USD). A report of the Safe Land-FP 7 project states that China includes vast areas classified as high landslide risk zones, which lead to more than 700 deaths and result in property and infrastructure damage worth RMB 20 billion yuan every year (http://www.laram.unisa.it/initiatives/safeland). Therefore, developing efficient solutions to reduce and mitigate landslide-related destruction is an urgent need. Landslide susceptibility mapping (LSM), which describes the spatial distribution of landslide occurrence probability in a certain area according to the geographical environment, is considered a common countermeasure for mitigating the effects of landslides (Huang and Zhao 2018; Merghadi et al. 2020).
At present, various models have been designed based on Geographic Information Systems (GIS) and data mining technology, with a major amount of research applying statistical analysis and machine learning methods (Li et al. 2019; Zhao et al. 2019). Meanwhile, the comparison of different models could facilitate better assessment of the abilities and limitations of each method and the statistical reliability of the LSM generated (Wang et al. 2020a). As the two most frequently adopted models for LSM, it is evident that both Logical Regression (LR) and Random Forest (RF) are suitable for analyzing the presence/absence of a landslide; a few studies have been published regarding the comparison of these two models. By illustration, Tsangaratos et al. reported that the RF model has a slightly higher predictive capability than the LR model in Nancheng (China) (Tsangaratos et al. 2016), while Hong et al. demonstrated that the LR model exhibits a higher predictive capability than the RF model in Lianhua (China) (Hong et al. 2016). Be that as it may, these studies have overlooked a crucial step: they have failed to consider the hyperparameters of their models.
Unlike general model parameters obtained through data training, hyperparameters are set before model training. For instance, the coefficient of the LR model could be obtained through training on the dataset, and it is the general model parameter; the number of decision trees in the RF model cannot be obtained through data training, but shall be set before the model training, and this is the hyperparameter. In machine learning, the performance of models is closely related to their hyperparameters. By constantly adjusting the hyperparameters' setting, the accuracy, operating speed and reliability of models can be greatly improved (Xie et al. 2021). For this reason, the accuracy of models not only depends on the algorithm used, but also on the hyperparameters, rendering optimization of hyperparameters indispensable in any model. However, discussions on hyperparameter optimization mostly appear in computer algorithm science. Premised on the Gaussian Kernel, Wang et al. proposed a Support Vector Machine hyperparameter selection method, which includes two stages: selecting the kernel parameters and training the optimal penalty factors (Wang et al. 2014). This method has the advantages of low computational complexity, high classification accuracy, and short training time. Kang et al. proposed a non-inertial particle thermal optimization method based on precise variance Gaussian Process (GP) regression (Kang et al. 2019). In the field of landslide assessment, few scholars have explored hyperparameter optimization for landslide susceptibility models. Sun et al. developed an optimized RF method based on hyperparameters optimization using Bayesian algorithm (Sun et al. 2020a). Accodlying, in the relevant comparative literatures, comparisons of two or more un-optimized models are not convincing, because through hyperparameter optimization, the accuracy of the models described in these studies could be further improved. In fact, their comparisons are unable to reflect the strengths and weaknesses of each model in a particular study area (Hong et al. 2016; Tsangaratos et al. 2016).
As the survey area of this study, Fengjie County is a mountainous region located in Three Gorges reservoir, southwest China, where landslides occur frequently. A Bayesian algorithm was applied in the present study to optimize the hyperparameters of LR and RF models, as well as further explore and compare these optimization models in Fengjie County. This study is purposed to: (1) make up for the crucial step in LSM (hyperparameter optimization) through the Bayesian algorithm; (2) provide a comparison case for LR and RF models after comprehensive consideration of hyperparameter optimization, so as to increase the convincing power of the comparison of these models; and (3) provide a knowledge base for model comparison: comparison premised on hyperparameter optimization.
Section snippets
Description of the study area
Fengjie County, the area of this study, spans 109°1′17″–109°45′58″E and 30°29′19″–31°22′33″N. As the east gate of Chongqing (Fig. 1), Fengjie County has mountainous landforms, located at the junction between the Dabashan arc fold fault zone and the eastern Sichuan fold belt, with sophisticated structural stress fields. The lithologies in Fengjie County largely include those of Quaternary Q, Jurassic J, Triassic T, Permian P, Carboniferous C, Devonian D, and Silurian S (Sun et al. 2020a). Under
Methods
The assessment procedure consisted of four phases: (a) the construction of the spatial database, (b) formulation of the training and test datasets and hyperparameter optimization for the two models, (c) generation of LSMs, and (d) evaluation and comparison of the two models (Fig. 2). The main operating software and platform were ArcGIS and ENVI, and the programming language was Python.
Model hyperparameter optimization
Table 3 lists the five main hyperparameters included in the LR model: Tol had a default value of 1e−4; max_iter had a default value of 100, with int as its default data type; and solver, penalty, and C were optimized, with the hyperparameter values obtained in each iteration as output. As can be observed in Fig. 3, the AUC values ranged from 0.755 to 0.799 under different hyperparameter values. When the AUC value reached the maximum (0.799), the optimal hyperparameter values were as follows:
Importance of conditioning factors
The impact of each conditioning factor on the occurrence of landslides varies; hence, analyzing the importance of the factors for landslide occurrence can provide valuable guidance for landslide disaster management. The above analyses have indicated that the RF model exhibits better performance in the case of this study; therefore, the “Mean Decrease Gini” in the RF model would be used to identify the critical order of factors. Fig. 9 illustrates the importance of the factors premised on the
Conclusion
In this study, the optimized LR and RF landslide susceptibility models were proposed through hyperparameter optimization. A comparison of these two models was conducted predicated on research on a typical landslide-prone area, Fengjie County, China. The following conclusions were drawn:
- (1)
Based on the Bayesian algorithm, the AUC value of the test dataset in LR model is improved by 4%, while the AUC value of the test dataset in RF model is improved by 10%, indicating that both models'
Funding
This research was funded by the National Key Research and Development Program of China (No. 2018 YFC 1505501), the Natural Science Foundation of Chongqing (Grant No. cstc2020jcyj-msxmX0841), and Humanities and Social Sciences Foundation of the Ministry of Education of China (Grant No. 20XJAZH002).
Declaration of Competing Interest
The authors declare no conflict of interest.
Acknowledgments
We want to express our gratitude to Chongqing Meteorological Administration for providing essential meteorological data and also to Chongqing Institute of Geology and Mineral Resources for providing valuable research data on historical landslides. We are also grateful to the editors and anonymous reviewers for their valuable comments on this manuscript.
References (27)
- et al.
Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naive Bayes tree for landslide susceptibility modeling
Sci. Total Environ.
(2018) - et al.
Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes
Neurocomputing
(2020) - et al.
Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling
Comput. Geosci.
(2015) - et al.
Quantitative assessment of landslide susceptibility along the Xianshuihe fault zone, Tibetan Plateau, China
Geomorphology
(2015) - et al.
Landslide susceptibility assessment in Lianhua County (China): a comparison between a random forest data mining technique and bivariate and multivariate statistical models
Geomorphology
(2016) - et al.
Review on landslide susceptibility mapping using support vector machines
Catena
(2018) - et al.
Different landslide sampling strategies in a grid-based bi-variate statistical susceptibility model
Geomorphology
(2016) - et al.
Modeling landslide susceptibility in data-scarce environments using optimized data mining and statistical methods
Geomorphology
(2018) - et al.
Presenting logistic regression-based landslide susceptibility results
Eng. Geol.
(2018) - et al.
Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling
Environ. Model Softw.
(2010)
A review of statistically-based landslide susceptibility models
Earth Sci. Rev.
Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)
Geomorphology
Super-parameter selection for Gaussian-Kernel SVM based on outlier-resisting
Measurement
Cited by (181)
Feature adaptation for landslide susceptibility assessment in “no sample” areas
2024, Gondwana ResearchImproving pixel-based regional landslide susceptibility mapping
2024, Geoscience FrontiersLandslide susceptibility mapping based on the reliability of landslide and non-landslide sample
2024, Expert Systems with ApplicationsArtificial intelligence-based prediction model for the elemental occurrence form of tailings and mine wastes
2024, Environmental ResearchImproving the model robustness of flood hazard mapping based on hyperparameter optimization of random forest
2024, Expert Systems with Applications