Predicting Micropollutant Removal by Reverse Osmosis and Nanofiltration Membranes: Is Machine Learning Viable?,Environmental Science & Technology

当前位置： X-MOL 学术 › Environ. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predicting Micropollutant Removal by Reverse Osmosis and Nanofiltration Membranes: Is Machine Learning Viable?
Environmental Science & Technology ( IF 10.8 ) Pub Date : 2021-08-03 , DOI: 10.1021/acs.est.1c04041
Nohyeong Jeong ₁ , Tai-Heng Chung ₁ , Tiezheng Tong ₁

Affiliation

Predictive models for micropollutant removal by membrane separation are highly desirable for the design and selection of appropriate membranes. While machine learning (ML) models have been applied for such purposes, their reliability might be compromised by data leakage due to inappropriate data splitting. More importantly, whether ML models can truly understand the mechanisms of membrane separation has not been revealed. In this study, we evaluate the capability of the XGBoost model to predict micropollutant removal efficiencies of reverse osmosis and nanofiltration membranes. Our results demonstrate that data leakage leads to falsely high prediction accuracy. By utilizing a model interpretation method based on the cooperative game theory, we test the knowledge of XGBoost on the mechanisms of membrane separation via quantifying the contributions of input variables to the model predictions. We reveal that XGBoost possesses an adequate understanding of size exclusion, but its knowledge of electrostatic interactions and adsorption is limited. Our findings suggest that future work should focus more on avoiding data leakage and evaluating the mechanistic knowledge of ML models. In addition, high-quality data from more diverse experimental conditions, as well as more informative variables, are needed to improve the accuracy of ML models for predicting membrane performance.

中文翻译：

通过反渗透和纳滤膜预测微污染物去除：机器学习可行吗？

通过膜分离去除微污染物的预测模型非常适合设计和选择合适的膜。虽然机器学习 (ML) 模型已用于此类目的，但由于数据拆分不当，数据泄漏可能会损害其可靠性。更重要的是，ML 模型是否能够真正理解膜分离的机制还没有被揭示。在本研究中，我们评估了 XGBoost 模型预测反渗透和纳滤膜微污染物去除效率的能力。我们的结果表明，数据泄漏会导致错误的高预测精度。利用基于合作博弈论的模型解释方法，我们通过量化输入变量对模型预测的贡献来测试 XGBoost 关于膜分离机制的知识。我们发现 XGBoost 对尺寸排除有足够的理解，但它对静电相互作用和吸附的了解是有限的。我们的研究结果表明，未来的工作应该更多地关注避免数据泄漏和评估 ML 模型的机械知识。此外，需要来自更多样化实验条件的高质量数据以及更多信息变量，以提高 ML 模型预测膜性能的准确性。我们的研究结果表明，未来的工作应该更多地关注避免数据泄漏和评估 ML 模型的机械知识。此外，需要来自更多样化实验条件的高质量数据以及更多信息变量，以提高 ML 模型预测膜性能的准确性。我们的研究结果表明，未来的工作应该更多地关注避免数据泄漏和评估 ML 模型的机械知识。此外，需要来自更多样化实验条件的高质量数据以及更多信息变量，以提高 ML 模型预测膜性能的准确性。

更新日期：2021-08-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11