当前位置: X-MOL 学术J. Hydroinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
Journal of Hydroinformatics ( IF 2.2 ) Pub Date : 2021-09-01 , DOI: 10.2166/hydro.2021.060
Pingjie Huang 1 , Lixiang Wang 1 , Dibo Hou 1 , Wangli Lin 1 , Jie Yu 1 , Guangxin Zhang 1 , Hongjian Zhang 1
Affiliation  

To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to water pollution identification, or to obtain accurate water pollution source information. Thus, this study proposed a feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree (GBDT) algorithm for identifying the type of surface water pollution in consideration of the distribution characteristics and intrinsic association of conventional water quality indicators. To improve the robustness to noise, we constructed the coarse-grained discretization features of each water quality index based on information entropy. The nonlinear correlation between water quality indexes and pollution classes was excavated by the GBDT algorithm, which was utilized to acquire tree transformed features. Water samples collected by a southern city Environmental Monitoring Center were used to test the performance of the proposed algorithm. Experimental results demonstrate that features extracted by the proposed method are more effective than the water quality indicators without feature engineering and features extracted by the principal component analysis algorithm.



中文翻译:

一种基于熵最小描述长度原理和GBDT的地表水污染识别特征提取方法

为有效防止河流水污染,需要进行水质监测。然而,现有的水质评价方法在表征水质条件方面存在局限性,很少有研究人员能够专注于与水污染识别相关的特征提取方法,或获得准确的水污染源信息。因此,本研究考虑到常规水质指标的分布特征和内在关联,提出了一种基于熵最小描述长度原理和梯度提升决策树(GBDT)算法的特征提取方法,用于识别地表水污染类型。为了提高对噪声的鲁棒性,我们基于信息熵构建了每个水质指标的粗粒度离散化特征。通过GBDT算法挖掘水质指标与污染等级的非线性相关性,获取树形变换特征。南方城市环境监测中心采集的水样用于测试所提出算法的性能。实验结果表明,所提方法提取的特征比未进行特征工程的水质指标和主成分分析算法提取的特征更有效。南方城市环境监测中心采集的水样用于测试所提出算法的性能。实验结果表明,所提方法提取的特征比未进行特征工程的水质指标和主成分分析算法提取的特征更有效。南方城市环境监测中心采集的水样用于测试所提出算法的性能。实验结果表明,所提方法提取的特征比未进行特征工程的水质指标和主成分分析算法提取的特征更有效。

更新日期:2021-09-24
down
wechat
bug