当前位置: X-MOL 学术J. Hydrol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of eight filter-based feature selection methods for monthly streamflow forecasting – three case studies on CAMELS data sets
Journal of Hydrology ( IF 5.9 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.jhydrol.2020.124897
Kun Ren , Wei Fang , Jihong Qu , Xia Zhang , Xiaoyu Shi

Abstract Recently, there has been an increased emphasis on employing data-driven models to forecast streamflow. However, in these data-driven models used for forecasting monthly streamflow, the performances of filter-based feature selection (FFS) methods have not been studied in detail. In this study, we investigated the effectiveness of eight common FFS methods, namely, linear Pearson correlation, partial linear Pearson correlation (PCI), mutual information (MI), conditional MI, partial MI, maximal relevance minimal redundancy Pearson correlation, maximal relevance minimal redundancy MI and gamma test methods, on three regression models, namely multiple linear regression (MLR), ensemble extreme learning machine (enELM) and k-nearest neighbor (KNN) regression, for real-world one-month-ahead streamflow forecasting. The study was conducted on three cases from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) data sets. Furthermore, two termination criterion (TC) methods, the Hampel test and resampling, were comparatively analyzed. The results of this study highlight three important findings. First, there was no dominant FFS method that coupled with enELM or KNN. Second, when resampling was applied to select a final model in the candidate combinations of the eight FFS methods and three regression models, PCI was the most favorable FFS method for the final model. Finally, the Hampel test TC was superior to the resampling TC in terms of stability and anti-overfitting. These findings have significant practical reference value for real-world monthly streamflow forecasting.

中文翻译:

用于月流量预测的八种基于滤波器的特征选择方法的比较——三个关于 CAMELS 数据集的案例研究

摘要 最近,人们越来越重视采用数据驱动模型来预测流量。然而,在这些用于预测月流量的数据驱动模型中,基于过滤器的特征选择(FFS)方法的性能尚未得到详细研究。在本研究中,我们调查了八种常见 FFS 方法的有效性,即线性 Pearson 相关、部分线性 Pearson 相关 (PCI)、互信息 (MI)、条件 MI、部分 MI、最大相关最小冗余 Pearson 相关、最大相关最小冗余 MI 和伽马测试方法,在三个回归模型上,即多元线性回归 (MLR)、集成极限学习机 (enELM) 和 k-最近邻 (KNN) 回归,用于现实世界的提前一个月流量预测。该研究针对来自大样本研究流域属性和气象学 (CAMELS) 数据集的三个案例进行。此外,比较分析了两种终止标准(TC)方法,Hampel 检验和重采样。这项研究的结果突出了三个重要的发现。首先,没有与 enELM 或 KNN 结合的主导 FFS 方法。其次,当在八种 FFS 方法和三种回归模型的候选组合中应用重采样选择最终模型时,PCI 是最终模型最有利的 FFS 方法。最后,Hampel 测试 TC 在稳定性和抗过拟合方面优于重采样 TC。这些发现对现实世界的月流量预测具有重要的实用参考价值。
更新日期:2020-07-01
down
wechat
bug