当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-05-07 , DOI: 10.1021/acs.jcim.9b01096
Dimitri Panagopoulos Abrahamsson 1 , June-Soo Park 2 , Randolph R Singh 3 , Marina Sirota 4, 5 , Tracey J Woodruff 1
Affiliation  

Non-targeted analysis provides a comprehensive approach to analyze environmental and biological samples for nearly all chemicals present. One of the main shortcomings of current analytical methods and workflows is that they are unable to provide any quantitative information constituting an important obstacle in understanding environmental fate and human exposure. Herein, we present an in silico quantification method using mahine-learning for chemicals analyzed using electrospray ionization (ESI). We considered three data sets from different instrumental setups: (i) capillary electrophoresis electrospray ionization-mass spectrometry (CE-MS) in positive ionization mode (ESI+), (ii) liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QTOF/MS) in ESI+ and (iii) LC-QTOF/MS in negative ionization mode (ESI−). We developed and applied two different machine-learning algorithms: a random forest (RF) and an artificial neural network (ANN) to predict the relative response factors (RRFs) of different chemicals based on their physicochemical properties. Chemical concentrations can then be calculated by dividing the measured abundance of a chemical, as peak area or peak height, by its corresponding RRF. We evaluated our models and tested their predictive power using 5-fold cross-validation (CV) and y randomization. Both the RF and the ANN models showed great promise in predicting RRFs. However, the accuracy of the predictions was dependent on the data set composition and the experimental setup. For the CE-MS ESI+ data set, the best model predicted measured RRFs with a mean absolute error (MAE) of 0.19 log units and a cross-validation coefficient of determination (Q2) of 0.84 for the testing set. For the LC-QTOF/MS ESI+ data set, the best model predicted measured RRFs with an MAE of 0.32 and a Q2 of 0.40. For the LC-QTOF/MS ESI– data set, the best model predicted measured RRFs with a MAE of 0.50 and a Q2 of 0.20. Our findings suggest that machine-learning algorithms can be used for predicting concentrations of nontargeted chemicals with reasonable uncertainties, especially in ESI+, while the application on ESI– remains a more challenging problem.

中文翻译:

机器学习在没有分析标准的化学硅定量中的应用。

非目标分析为分析几乎所有存在的化学物质的环境和生物样品提供了一种全面的方法。当前分析方法和工作流程的主要缺点之一是它们无法提供任何定量信息,这是理解环境命运和人类暴露的重要障碍。在本文中,我们介绍了一种使用机器学习的计算机定量方法,用于使用电喷雾电离(ESI)分析的化学物质。我们考虑了来自不同仪器设置的三个数据集:(i)处于正电离模式(ESI +)的毛细管电泳电喷雾电离质谱(CE-MS),(ii)液相色谱四极杆飞行时间质谱(LC-QTOF / MS)和(iii)负电离模式(ESI-)中的LC-QTOF / MS。我们开发并应用了两种不同的机器学习算法:随机森林(RF)和人工神经网络(ANN),根据其理化特性预测不同化学品的相对响应因子(RRF)。然后,可以通过将测得的某种化学物质的丰度(峰面积或峰高)除以其相应的RRF来计算化学浓度。我们评估了我们的模型,并使用5倍交叉验证(CV)和 y随机化。RF和ANN模型在预测RRF方面都显示出了巨大的希望。但是,预测的准确性取决于数据集的组成和实验设置。对于CE-MS ESI +数据集,最佳模型预测了测得的RRF,测试集的平均绝对误差(MAE)为0.19 log单位,交叉验证确定系数(Q 2)为0.84。对于LC-QTOF / MS ESI +数据集,最佳模型预测的测得RRF的MAE为0.32,Q 2为0.40。对于LC-QTOF / MS ESI-数据集,最佳模型预测的测得RRF的MAE为0.50,Q 2为0.20。我们的发现表明,机器学习算法可用于预测具有合理不确定性的非目标化学品的浓度,尤其是在ESI +中,而在ESI-上的应用仍然是一个更具挑战性的问题。
更新日期:2020-06-23
down
wechat
bug