Elsevier

Engineering Geology

Volume 281, February 2021, 105972
Engineering Geology

Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest

https://doi.org/10.1016/j.enggeo.2020.105972Get rights and content

Highlights

  • Database of 1520 landslides in Fengjie County, TGR was created.

  • Model hyperparameters optimization by Bayesian algorithm.

  • Comparison between LR and RF models premised on hyperparameter optimization.

  • AUC values in LR and RF models is improved by 4% and 10%, respectively.

  • Susceptibility map by the optimized models has higher prediction efficiency.

Abstract

This study aims to develop two optimized models of landslide susceptibility mapping (LSM), i.e., logical regression (LR) and random forest (RF) models, premised on hyperparameter optimization using the Bayesian algorithm, and compare their applicability in a typical landslide-prone area (Fengjie County, China). First, data for 1520 historical landslides occurring was collected from field investigations and literature reviews, to construct a spatial database of 16 conditioning factors. Subsequently, the Bayesian algorithm was adopted to optimize the hyperparameters of the LR and RF models, premised on the dataset of all cells (including landslides and non-landslides). Finally, the two optimized models were estimated and compared with the area under curve (AUC) and confusion matrix. Based on the Bayesian algorithm, the AUC value of the test dataset in LR model is improved by 4%, while the AUC value of the test dataset in RF model is improved by 10%, indicating that both models' hyperparameter optimization premised on the Bayesian algorithm have delivered considerable impact on the accuracy of the models; so hyperparameter optimization is very important for models of LSM. Although both models exhibit reasonable performances, the optimized RF model premised on hyperparameter optimization has a better stability and predictive capability in case area. These findings make up for the crucial step in LSM (hyperparameter optimization) through the Bayesian algorithm, and provide a comparison case between LR and RF models after comprehensive consideration of hyperparameter optimization, so as to increase the convincing power of the comparison of these models and provide a knowledge base for model comparison: comparison premised on hyperparameter optimization.

Introduction

Landslides, a type of geological hazard, frequently occur around the world, leading to severe destructive consequences (Lombardo and Mai 2018). According to the Emergency Events Database (EM-DAT) for 2014–2018, landslides resulted in 4914 deaths, led to 27,110 people becoming homeless, and caused economic losses of $2.1 billion (USD). A report of the Safe Land-FP 7 project states that China includes vast areas classified as high landslide risk zones, which lead to more than 700 deaths and result in property and infrastructure damage worth RMB 20 billion yuan every year (http://www.laram.unisa.it/initiatives/safeland). Therefore, developing efficient solutions to reduce and mitigate landslide-related destruction is an urgent need. Landslide susceptibility mapping (LSM), which describes the spatial distribution of landslide occurrence probability in a certain area according to the geographical environment, is considered a common countermeasure for mitigating the effects of landslides (Huang and Zhao 2018; Merghadi et al. 2020).

At present, various models have been designed based on Geographic Information Systems (GIS) and data mining technology, with a major amount of research applying statistical analysis and machine learning methods (Li et al. 2019; Zhao et al. 2019). Meanwhile, the comparison of different models could facilitate better assessment of the abilities and limitations of each method and the statistical reliability of the LSM generated (Wang et al. 2020a). As the two most frequently adopted models for LSM, it is evident that both Logical Regression (LR) and Random Forest (RF) are suitable for analyzing the presence/absence of a landslide; a few studies have been published regarding the comparison of these two models. By illustration, Tsangaratos et al. reported that the RF model has a slightly higher predictive capability than the LR model in Nancheng (China) (Tsangaratos et al. 2016), while Hong et al. demonstrated that the LR model exhibits a higher predictive capability than the RF model in Lianhua (China) (Hong et al. 2016). Be that as it may, these studies have overlooked a crucial step: they have failed to consider the hyperparameters of their models.

Unlike general model parameters obtained through data training, hyperparameters are set before model training. For instance, the coefficient of the LR model could be obtained through training on the dataset, and it is the general model parameter; the number of decision trees in the RF model cannot be obtained through data training, but shall be set before the model training, and this is the hyperparameter. In machine learning, the performance of models is closely related to their hyperparameters. By constantly adjusting the hyperparameters' setting, the accuracy, operating speed and reliability of models can be greatly improved (Xie et al. 2021). For this reason, the accuracy of models not only depends on the algorithm used, but also on the hyperparameters, rendering optimization of hyperparameters indispensable in any model. However, discussions on hyperparameter optimization mostly appear in computer algorithm science. Premised on the Gaussian Kernel, Wang et al. proposed a Support Vector Machine hyperparameter selection method, which includes two stages: selecting the kernel parameters and training the optimal penalty factors (Wang et al. 2014). This method has the advantages of low computational complexity, high classification accuracy, and short training time. Kang et al. proposed a non-inertial particle thermal optimization method based on precise variance Gaussian Process (GP) regression (Kang et al. 2019). In the field of landslide assessment, few scholars have explored hyperparameter optimization for landslide susceptibility models. Sun et al. developed an optimized RF method based on hyperparameters optimization using Bayesian algorithm (Sun et al. 2020a). Accodlying, in the relevant comparative literatures, comparisons of two or more un-optimized models are not convincing, because through hyperparameter optimization, the accuracy of the models described in these studies could be further improved. In fact, their comparisons are unable to reflect the strengths and weaknesses of each model in a particular study area (Hong et al. 2016; Tsangaratos et al. 2016).

As the survey area of this study, Fengjie County is a mountainous region located in Three Gorges reservoir, southwest China, where landslides occur frequently. A Bayesian algorithm was applied in the present study to optimize the hyperparameters of LR and RF models, as well as further explore and compare these optimization models in Fengjie County. This study is purposed to: (1) make up for the crucial step in LSM (hyperparameter optimization) through the Bayesian algorithm; (2) provide a comparison case for LR and RF models after comprehensive consideration of hyperparameter optimization, so as to increase the convincing power of the comparison of these models; and (3) provide a knowledge base for model comparison: comparison premised on hyperparameter optimization.

Section snippets

Description of the study area

Fengjie County, the area of this study, spans 109°1′17″–109°45′58″E and 30°29′19″–31°22′33″N. As the east gate of Chongqing (Fig. 1), Fengjie County has mountainous landforms, located at the junction between the Dabashan arc fold fault zone and the eastern Sichuan fold belt, with sophisticated structural stress fields. The lithologies in Fengjie County largely include those of Quaternary Q, Jurassic J, Triassic T, Permian P, Carboniferous C, Devonian D, and Silurian S (Sun et al. 2020a). Under

Methods

The assessment procedure consisted of four phases: (a) the construction of the spatial database, (b) formulation of the training and test datasets and hyperparameter optimization for the two models, (c) generation of LSMs, and (d) evaluation and comparison of the two models (Fig. 2). The main operating software and platform were ArcGIS and ENVI, and the programming language was Python.

Model hyperparameter optimization

Table 3 lists the five main hyperparameters included in the LR model: Tol had a default value of 1e−4; max_iter had a default value of 100, with int as its default data type; and solver, penalty, and C were optimized, with the hyperparameter values obtained in each iteration as output. As can be observed in Fig. 3, the AUC values ranged from 0.755 to 0.799 under different hyperparameter values. When the AUC value reached the maximum (0.799), the optimal hyperparameter values were as follows:

Importance of conditioning factors

The impact of each conditioning factor on the occurrence of landslides varies; hence, analyzing the importance of the factors for landslide occurrence can provide valuable guidance for landslide disaster management. The above analyses have indicated that the RF model exhibits better performance in the case of this study; therefore, the “Mean Decrease Gini” in the RF model would be used to identify the critical order of factors. Fig. 9 illustrates the importance of the factors premised on the

Conclusion

In this study, the optimized LR and RF landslide susceptibility models were proposed through hyperparameter optimization. A comparison of these two models was conducted predicated on research on a typical landslide-prone area, Fengjie County, China. The following conclusions were drawn:

  • (1)

    Based on the Bayesian algorithm, the AUC value of the test dataset in LR model is improved by 4%, while the AUC value of the test dataset in RF model is improved by 10%, indicating that both models'

Funding

This research was funded by the National Key Research and Development Program of China (No. 2018 YFC 1505501), the Natural Science Foundation of Chongqing (Grant No. cstc2020jcyj-msxmX0841), and Humanities and Social Sciences Foundation of the Ministry of Education of China (Grant No. 20XJAZH002).

Declaration of Competing Interest

The authors declare no conflict of interest.

Acknowledgments

We want to express our gratitude to Chongqing Meteorological Administration for providing essential meteorological data and also to Chongqing Institute of Geology and Mineral Resources for providing valuable research data on historical landslides. We are also grateful to the editors and anonymous reviewers for their valuable comments on this manuscript.

References (27)

Cited by (181)

View all citing articles on Scopus
View full text