Comparing the Predictive Performance, Interpretability, and Accessibility of Machine Learning and Physically Based Models for Water Treatment,ACS ES&T Engineering

当前位置： X-MOL 学术 › ACS ES&T Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparing the Predictive Performance, Interpretability, and Accessibility of Machine Learning and Physically Based Models for Water Treatment
ACS ES&T Engineering Pub Date : 2020-11-16 , DOI: 10.1021/acsestengg.0c00053
Dewey W. Dunnington ₁ , Benjamin F. Trueman ₁ , William J. Raseman ₂ , Lindsay E. Anderson ₁ , Graham A. Gagnon ₁

Affiliation

Using an organic carbon removal data set (n = 500), we compared a physically based semiempirical coagulation model (Langmuir sorption-removal) and three ML modeling methods using quantitative (model performance) and qualitative (model interpretability and accessibility) criteria to identify potential barriers to adoption in water treatment. We found that a gradient-boosted tree ensemble and an artificial neural network provided the most accurate predictions of organic carbon removal and that all models provided accurate predictions when test data were well-characterized by the training data and confirmed that the physically based model had the lowest prediction error when extrapolating. As assessed by the ability of model predictions to be reconciled with industry-specific knowledge, the physically based and linear models were the most interpretable. As assessed by the ability for utilities to implement models on an ad hoc basis, the physically based and multiple linear models were deemed to be the most accessible. Collectively, our study suggests that ML-based models offer the best predictive performance when adequate training data are available and that physically based models are best suited when extrapolation is necessary. Potential solutions for limited interpretability of ML-based models include variable importance and sensitivity analysis; a potential solution for limited accessibility of ML-based models is training of stakeholders in modeling techniques.

中文翻译：

比较机器学习和基于物理的水处理模型的预测性能，可解释性和可访问性

使用有机碳去除数据集（n= 500），我们比较了基于物理的半经验混凝模型（Langmuir吸附去除）和三种ML建模方法，使用定量（模型性能）和定性（模型可解释性和可及性）标准来确定水处理采用的潜在障碍。我们发现，梯度增强树集合和人工神经网络提供了最准确的有机碳去除预测，并且当训练数据很好地表征了测试数据时，所有模型都提供了准确预测，并确认基于物理的模型具有推断时的最低预测误差。通过模型预测与行业特定知识的协调能力进行的评估，基于物理的模型和线性模型是最可解释的。根据公用事业公司临时实施模型的能力评估，基于物理的模型和多个线性模型被认为是最易于访问的。总体而言，我们的研究表明，当有足够的训练数据时，基于ML的模型可以提供最佳的预测性能；而当需要进行推断时，基于物理的模型最适合。基于ML的模型的有限解释性的潜在解决方案包括变量重要性和敏感性分析；限制基于ML的模型的可访问性的潜在解决方案是对利益相关者进行建模技术方面的培训。我们的研究表明，当有足够的训练数据时，基于ML的模型可以提供最佳的预测性能，而当需要进行推断时，基于物理的模型最适合。基于ML的模型的有限解释性的潜在解决方案包括变量重要性和敏感性分析；限制基于ML的模型的可访问性的潜在解决方案是对利益相关者进行建模技术方面的培训。我们的研究表明，当有足够的训练数据时，基于ML的模型可以提供最佳的预测性能，而当需要进行推断时，基于物理的模型最适合。基于ML的模型的有限解释性的潜在解决方案包括变量重要性和敏感性分析；限制基于ML的模型的可访问性的潜在解决方案是对利益相关者进行建模技术方面的培训。

更新日期：2020-11-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>