Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples,Gondwana Research

当前位置： X-MOL 学术 › Gondwana Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples
Gondwana Research ( IF 7.2 ) Pub Date : 2022-05-25 , DOI: 10.1016/j.gr.2022.05.012
Can Yang , Lei-Lei Liu , Faming Huang , Lei Huang , Xiao-Mi Wang

Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The accuracy of machine learning-based LSA often hinges on the ratio of landslide to non-landslide (or positive/negative, P/N) samples. A proper ratio of the P/N samples will significantly improve the performance of machine learning-based LSA, but an improper ratio can cause inadequate training or data pollution. Conventionally, the determination of the P/N sample ratio is based on experience or by trials and errors, which has substantial uncertainties. This paper proposes a Bayesian optimization method to optimize the P/N sample ratio for machine learning models. Firstly, AnHua County in Hunan province of China is selected as the study area because of numerous landslide disasters that occurred in recent years. Secondly, three representative machine learning models of the support vector machine (SVM), the random forest (RF) and the gradient boost decision tree (GBDT) are adopted to assess the landslide susceptibility. Subsequently, a Bayesian optimization algorithm is used to obtain the optimal P/N sample ratio, considering the effects of various ratios of training/test set. Finally, the improved models and the corresponding landslide susceptibility maps are established using the obtained optimal P/N sample ratio. The results show that the performance of SVM, RF and GBDT are all improved with the optimized optimal P/N sample ratio. The highest AUC value is for the RF model (0.840, improved by 1.3%), followed by GBDT (0.831, improved by 1.3%), and SVM (0.775, improved by 0.7%). However, the RF and GBDT are more suitable than SVM to address sample unbalance issues in LSA. It is suggested to use the Bayesian optimization algorithm to optimize the P/N sample ratio in machine learning-based LSA model.

更新日期：2022-05-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11