当前位置: X-MOL 学术Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A data-driven binary-classification framework for oil fingerprinting analysis
Environmental Research ( IF 8.3 ) Pub Date : 2021-06-08 , DOI: 10.1016/j.envres.2021.111454
Yifu Chen 1 , Bing Chen 1 , Xing Song 1 , Qiao Kang 1 , Xudong Ye 1 , Baiyu Zhang 1
Affiliation  

A marine oil spill is one of the most challenging environmental issues, resulting in severe long-term impacts on ecosystems and human society. Oil dispersants are widely applied as a treating agent in oil spill response operations. The usage of dispersants significantly changes the behaviors of dispersed oil and consequently challenges the oil fingerprinting analysis. In this study, machine learning was first introduced to analyze oil fingerprinting by developing a data-driven binary classification framework. The modeling integrated dimensionality reduction algorithms (e.g., principal component analysis, PCA) to distinguish. Five groups of biomarkers, including terpanes, steranes, triaromatic steranes (TA-steranes), monoaromatic steranes (MA-steranes), and diamantanes, were selected. Different feature spaces were created from the diagnostic index of biomarkers, and six ML algorithms were applied for comparative analysis and optimizing the modeling process, including k-nearest neighbor (KNN), support vector classifier (SVC), random forest classifier (RFC), decision tree classifier (DTC), logistic regression classifier (LRC), and ensemble vote classifier (EVC). Hyperparameter optimization and cross-validation through GridSearchCV were applied to prevent overfitting and increase the model accuracy. Model performance was evaluated by model score and F-score through confusion matrices. The results indicated that the RFC algorithm from the diamantanes dataset performed the best. It delivered the highest F-score (0.871) versus the lowest F-score (0.792) from the EVC algorithm from the TA-steranes dataset by PCA with a variance of 95%. Therefore, diamantanes were recommended as the most suitable biomarker for distinguishing WCO and CDO to aid oil fingerprinting under the conditions in this study. The results proved the proposed method as a potential analysis tool for oil spill source identification through ML-aided oil fingerprinting. The study also showed the value of ML methods in oil spill response research and practice.



中文翻译:

用于油指纹分析的数据驱动二分类框架

海洋石油泄漏是最具挑战性的环境问题之一,对生态系统和人类社会造成严重的长期影响。油分散剂被广泛用作溢油响应作业中的处理剂。分散剂的使用显着改变了分散油的行为,从而对油指纹分析提出了挑战。在这项研究中,首先引入机器学习来通过开发数据驱动的二进制分类框架来分析油指纹。该建模集成了降维算法(例如,主成分分析,PCA)来区分。选择了五组生物标志物,包括萜烷、甾烷、三芳香甾烷(TA-steranes)、单芳香甾烷(MA-steranes)和金刚烷。法改会) 和集成投票分类器 (EVC)。通过 GridSearchCV 进行超参数优化和交叉验证,以防止过度拟合并提高模型精度。模型性能通过模型​​分数和 F 分数通过混淆矩阵进行评估。结果表明,来自 diamantanes 数据集的 RFC 算法表现最好。它提供了最高的 F 分数 (0.871) 与来自 PCA 的 TA-steranes 数据集的 EVC 算法的最低 F 分数 (0.792),方差为 95%。因此,金刚烷被推荐为最适合区分 WCO 和 CDO 的生物标志物,以帮助在本研究条件下进行油指纹识别。结果证明,所提出的方法是通过 ML 辅助油指纹识别进行溢油源识别的潜在分析工具。

更新日期:2021-06-23
down
wechat
bug