Two novelty learning models developed based on deep cascade forest to address the environmental imbalanced issues: A case study of drinking water quality prediction,Environmental Pollution

当前位置： X-MOL 学术 › Environ. Pollut. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two novelty learning models developed based on deep cascade forest to address the environmental imbalanced issues: A case study of drinking water quality prediction
Environmental Pollution ( IF 7.6 ) Pub Date : 2021-09-11 , DOI: 10.1016/j.envpol.2021.118153
Xingguo Chen ₁ , Houtao Liu ₂ , Fengrui Liu ₃ , Tian Huang ₂ , Ruqin Shen ₄ , Yongfeng Deng ₄ , Da Chen ₄

Affiliation

Environmental quality data sets are typically imbalanced, because environmental pollution events are rarely observed in daily life. Prediction of imbalanced data sets is a major challenge in machine learning. Our recent work has shown deep cascade forest (DCF), as a base learning model, is promising to be recommended for environmental quality prediction. Although some traditional models were improved by introducing the cost matrix, little is known about whether cost matrix could enhance the prediction performance of DCF. Additionally, feature extraction is also an important way to potentially improve the model's ability to predict the imbalanced data. Here, we developed two novelty learning models based on DCF: cost-sensitive DCF (CS-DCF) and DCF that combines unsupervised learning models and greedy methods (USM-DCF-G). Subsequently, CS-DCF and USM-DCF-G were successfully verified by an imbalanced drinking water quality data set. Our data presented both CS-DCF and USM-DCF-G show better prediction performance than that of DCF alone did. In particular, USM-DCF-G shows the best performance with the highest F1-score (95.12 ± 2.56%), after feature extraction and selection by using unsupervised learning models and greedy methods. Thus, the two learning models, especially USM-DCF-G, were promising learning models to address environmental imbalanced issues and accurately predict environmental quality.

中文翻译：

基于深梯级林开发的两种新颖学习模型解决环境不平衡问题：以饮用水水质预测为例

环境质量数据集通常是不平衡的，因为在日常生活中很少观察到环境污染事件。预测不平衡的数据集是机器学习中的一个主要挑战。我们最近的工作表明，作为基础学习模型的深级联森林（DCF）有望被推荐用于环境质量预测。虽然一些传统模型通过引入成本矩阵进行了改进，但对于成本矩阵是否可以提高 DCF 的预测性能知之甚少。此外，特征提取也是潜在提高模型预测不平衡数据能力的重要方法。在这里，我们开发了两种基于 DCF 的新颖学习模型：成本敏感 DCF (CS-DCF) 和结合无监督学习模型和贪婪方法的 DCF (USM-DCF-G)。随后，CS-DCF 和 USM-DCF-G 通过不平衡的饮用水水质数据集成功验证。我们的数据显示 CS-DCF 和 USM-DCF-G 比单独的 DCF 表现出更好的预测性能。特别是，在使用无监督学习模型和贪婪方法进行特征提取和选择后，USM-DCF-G 表现出最佳性能，F1 分数最高（95.12 ± 2.56%）。因此，这两种学习模型，尤其是 USM-DCF-G，是解决环境不平衡问题和准确预测环境质量的有前途的学习模型。56%)，在使用无监督学习模型和贪婪方法进行特征提取和选择之后。因此，这两种学习模型，尤其是 USM-DCF-G，是解决环境不平衡问题和准确预测环境质量的有前途的学习模型。56%)，在使用无监督学习模型和贪婪方法进行特征提取和选择之后。因此，这两种学习模型，尤其是 USM-DCF-G，是解决环境不平衡问题和准确预测环境质量的有前途的学习模型。

更新日期：2021-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11