当前位置: X-MOL 学术Environ. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting Extraction Selectivity of Acetic Acid in Pervaporation by Machine Learning Models with Data Leakage Management
Environmental Science & Technology ( IF 10.8 ) Pub Date : 2023-03-27 , DOI: 10.1021/acs.est.2c06382
Meiqi Yang 1 , Jun-Jie Zhu 1 , Allyson McGaughey 1, 2 , Sunxiang Zheng 1 , Rodney D Priestley 2 , Zhiyong Jason Ren 1
Affiliation  

The extraction of acetic acid and other carboxylic acids from water is an emerging separation need as they are increasingly produced from waste organics and CO2 during carbon valorization. However, the traditional experimental approach can be slow and expensive, and machine learning (ML) may provide new insights and guidance in membrane development for organic acid extraction. In this study, we collected extensive literature data and developed the first ML models for predicting separation factors between acetic acid and water in pervaporation with polymers’ properties, membrane morphology, fabrication parameters, and operating conditions. Importantly, we assessed seed randomness and data leakage problems during model development, which have been overlooked in ML studies but will result in over-optimistic results and misinterpreted variable importance. With proper data leakage management, we established a robust model and achieved a root-mean-square error of 0.515 using the CatBoost regression model. In addition, the prediction model was interpreted to elucidate the variables’ importance, where the mass ratio was the topmost significant variable in predicting separation factors. In addition, polymers’ concentration and membranes’ effective area contributed to information leakage. These results demonstrate ML models’ advances in membrane design and fabrication and the importance of vigorous model validation.

中文翻译:

通过具有数据泄漏管理的机器学习模型预测乙酸在渗透蒸发中的提取选择性

从水中提取乙酸和其他羧酸是一种新兴的分离需求,因为它们越来越多地由废有机物和 CO 2生产在碳增值过程中。然而,传统的实验方法可能缓慢且昂贵,而机器学习 (ML) 可能会为有机酸萃取膜的开发提供新的见解和指导。在这项研究中,我们收集了大量的文献数据,并开发了第一个 ML 模型,用于预测渗透蒸发中乙酸和水之间的分离因子,包括聚合物的性质、膜形态、制造参数和操作条件。重要的是,我们在模型开发过程中评估了种子随机性和数据泄漏问题,这些问题在 ML 研究中被忽视,但会导致过度乐观的结果和误解变量的重要性。通过适当的数据泄漏管理,我们建立了一个稳健的模型,并使用 CatBoost 回归模型实现了 0.515 的均方根误差。此外,解释预测模型以阐明变量的重要性,其中质量比是预测分离因子的最重要变量。此外,聚合物的浓度和膜的有效面积也会导致信息泄漏。这些结果证明了 ML 模型在膜设计和制造方面的进步以及强有力的模型验证的重要性。
更新日期:2023-03-27
down
wechat
bug