Evaluation of re-sampling methods on performance of machine learning models to predict landslide susceptibility,Geocarto International

当前位置： X-MOL 学术 › Geocarto Int. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluation of re-sampling methods on performance of machine learning models to predict landslide susceptibility
Geocarto International ( IF 3.3 ) Pub Date : 2020-12-22 , DOI: 10.1080/10106049.2020.1837257
Moslem Borji Hassangavyar ₁ , Hadi Eskandari Damaneh ₂ , Quoc Bao Pham _{3,

4} , Nguyen Thi Thuy Linh ₅ , John Tiefenbacher ₆ , Quang-Vu Bach ₇

Affiliation

Abstract

This study tests the applicability of three resampling methods (i.e. bootstrapping, random-subsampling and cross-validation) for enhancing the performance of eight machine-learning models: boosted regression trees, flexible discriminant analysis, random forests, mixture discriminate analysis, multivariate adaptive regression splines, classification and regression trees, support vector machines and generalized linear models, compared to the use of the original data. The results of models were evaluated using correlation (COR), area under curve (AUC), true skill statistic (TSS), receiver-operating characteristic and the probability of detection (POD). The evaluation showed that the bootstrapping technique improved the performance of all models. The Bootstrapping-random forest (with COR = 0.75, AUC = 0.92, TSS = 0.80 and POD = 0.98) proved to be the best model for landslide prediction. Among the 18 contributing factors, distance from fault, curvature and precipitation were the most influential in all 32 models .

Highlights
Hazard prediction of landslide by the 8 machine-learning (ML) models.
Multiple morphometric, climatic, geologic, vegetation and human factors were used.
Tests the applicability of three resampling methods.
The performance of the ML models and coupling models were assessed.

中文翻译：

重采样方法对机器学习模型的性能预测滑坡敏感性的评估

摘要

这项研究测试了三种重采样方法（即自举，随机子采样和交叉验证）对提高八个机器学习模型的性能的适用性：增强回归树，灵活判别分析，随机森林，混合判别分析，多元自适应回归与使用原始数据相比，样条曲线，分类树和回归树，支持向量机和广义线性模型。使用相关性（COR），曲线下面积（AUC），真实技能统计（TSS），接收者操作特征和检测概率（POD）评估模型的结果。评估表明，自举技术提高了所有模型的性能。自举随机森林（COR = 0.75，AUC = 0.92，TSS = 0.80和POD = 0。98）被证明是滑坡预测的最佳模型。在18个影响因素中，距断层，曲率和降水的距离在所有32个模型中影响最大。