当前位置: X-MOL 学术Ecol. Indic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
Ecological Indicators ( IF 6.9 ) Pub Date : 2021-02-17 , DOI: 10.1016/j.ecolind.2021.107499
Zohre Ebrahimi-Khusfi , Ali Reza Nafarzadegan , Fatemeh Dargahian

In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands.



中文翻译:

使用特征选择和机器学习技术预测伊朗东南部沙漠湿地周围的尘土飞扬日数

在过去的几十年中,一些沙漠湿地已成为世界干旱和半干旱地区粉尘产生的关键区域。准确预测这些地区的尘土飞扬天数(NDD)非常重要。预测气候和环境变量的最流行方法是机器学习(ML)。尽管它在空间预测中受到了更多的关注,但在这些变量的时间预测中却受到了较少的关注。这项工作是使用ML模型和不同特征选择(FS)技术预测伊朗东南部主要粉尘产生源中NDD的首次努力。为此,使用与研究期(1988–2017)相关的21个预测变量的月度数据来预测目标变量(NDD)。主要目的是评估支持向量机(SVM),条件推断随机森林(CRF)和基于三个FS算法的随机梯度增强(SGB)模型,包括Boruta,多元自适应回归样条(MARS)和递归特征消除(RFE)技术,用于预测哈莫恩湿地附近的NDD。在分析了共线性效应并去除了容差<0.11的自变量后,选择了最佳属性来训练SVM,SGB和CRF模型。将所有数据集随机分为训练(70%)和验证(30%)集。基于与保留数据相关的确定系数(R平方),均方根误差(RMSE),平均绝对误差(MAE)和纳什萨特克利夫效率(NSE)系数来评估模型的性能。结果表明,SGB-MARS,SGB-RFE,2(0.9),RMSE(2.5),MAE(1.9)和NSE(0.9)。此外,表面风速,最高气温,相对湿度,湿地干燥床和侵蚀性风频被检测为预测研究区域NDD的最重要因素。这项研究鼓励我们将SGB模型与各种FS技术一起使用,以预测沙漠湿地周围的NDD。这些结果可以帮助决策者减少扬尘的风险,并提高沙漠湿地附近居民的安全。

更新日期:2021-02-18
down
wechat
bug