当前位置: X-MOL 学术Chemosphere › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Source Allocation of Per- and Polyfluoroalkyl Substances (PFAS) with Supervised Machine Learning: Classification Performance and the Role of Feature Selection in an Expanded Dataset
Chemosphere ( IF 8.8 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.chemosphere.2021.130124
Tohren C.G. Kibbey , Rafal Jabrzemski , Denis M. O’Carroll

This work explores the use of supervised machine learning as a tool for identifying the source of per- and polyfluorinated alkyl substances (PFAS) in water samples on the basis of the detected component concentrations. Specifically, the work focuses on distinguishing between PFAS used in aqueous film forming foam (AFFF) fire suppression applications, and PFAS from other sources. The fact that many sites contaminated with legacy PFOS-based AFFF formulations are dominated by perfluorinated sulfonates can make it tempting to naïvely classify samples dominated by perfluorinated sulfonates as being of AFFF origin. However, a large fraction of samples do not follow this pattern, including some of the most important cases, such as legacy PFOS-based AFFF far from its source. Although PFAS composition can vary substantially at a site as a result of mobility differences between components and other factors, the hypothesis driving the work is that compositional patterns created in the environment can be recognized across different sites by machine learning, and used for source allocation. This work builds on earlier preliminary work by the authors based on a small dataset. This work is based on a much larger 8105-sample dataset, and explores different preprocessing approaches, as well as how feature selection impacts classification performance. The results of this work strongly support the idea that supervised machine learning based on composition can identify patterns that can be used to distinguish PFAS sources. The results provide new insights into selection of classifiers and features for source identification based on PFAS component composition.



中文翻译:

具有监督机器学习的全氟和多氟烷基物质(PFAS)的源分配:分类性能和扩展数据集中特征选择的作用

这项工作探索了有监督的机器学习作为一种工具的功能,该工具可根据检测到的成分浓度来识别水样品中的全氟和多氟烷基物质(PFAS)的来源。具体而言,该工作着重于区分用于水性成膜泡沫(AFFF)灭火应用中的PFAS和来自其他来源的PFAS。许多被传统的基于全氟辛烷磺酸的AFFF配方污染的场所都被全氟磺酸盐所占据,这一事实使人们很容易将纯全氟磺酸盐所占的样品简单地分类为AFFF起源。但是,很大一部分样本没有遵循这种模式,包括一些最重要的情况,例如远离源头的基于PFOS的传统AFFF。尽管由于组件和其他因素之间的迁移率差异,PFAS的组成可能在一个站点上发生很大的变化,但推动这项工作的假说是,可以通过机器学习在不同站点之间识别环境中创建的组成模式,并将其用于源分配。这项工作是建立在作者根据一个较小的数据集进行的早期初步工作的基础上的。这项工作基于更大的8105样本数据集,并探讨了不同的预处理方法以及特征选择如何影响分类性能。这项工作的结果有力地支持了这样的想法,即基于组合的有监督的机器学习可以识别可用于区分PFAS来源的模式。

更新日期:2021-02-26
down
wechat
bug