当前位置: X-MOL 学术Talanta › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparison of three liquid chromatography (LC) retention time prediction models
Talanta ( IF 6.1 ) Pub Date : 2018-01-11 , DOI: 10.1016/j.talanta.2018.01.022
Andrew D. McEachran , Kamel Mansouri , Seth R. Newton , Brandiese E.J. Beverly , Jon R. Sobus , Antony J. Williams

High-resolution mass spectrometry (HRMS) data has revolutionized the identification of environmental contaminants through non-targeted analysis (NTA). However, chemical identification remains challenging due to the vast number of unknown molecular features typically observed in environmental samples. Advanced data processing techniques are required to improve chemical identification workflows. The ideal workflow brings together a variety of data and tools to increase the certainty of identification. One such tool is chromatographic retention time (RT) prediction, which can be used to reduce the number of possible suspect chemicals within an observed RT window. This paper compares the relative predictive ability and applicability to NTA workflows of three RT prediction models: (1) a logP (octanol-water partition coefficient)-based model using EPI Suite™ logP predictions; (2) a commercially available ACD/ChromGenius model; and, (3) a newly developed Quantitative Structure Retention Relationship model called OPERA-RT. Models were developed using the same training set of 78 compounds with experimental RT data and evaluated for external predictivity on an identical test set of 19 compounds. Both the ACD/ChromGenius and OPERA-RT models outperformed the EPI Suite™ logP-based RT model (R2 = 0.81–0.92, 0.86-0.83, 0.66–0.69 for training-test sets, respectively). Further, both OPERA-RT and ACD/ChromGenius predicted 95% of RTs within a ± 15% chromatographic time window of experimental RTs. Based on these results, we simulated an NTA workflow with a ten-fold larger list of candidate structures generated for formulae of the known test set chemicals using the U.S. EPA's CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), RTs for all candidates were predicted using both ACD/ChromGenius and OPERA-RT, and RT screening windows were assessed for their ability to filter out unlikely candidate chemicals and enhance potential identification. Compared to ACD/ChromGenius, OPERA-RT screened out a greater percentage of candidate structures within a 3-min RT window (60% vs. 40%) but retained fewer of the known chemicals (42% vs. 83%). By several metrics, the OPERA-RT model, generated as a proof-of-concept using a limited set of open source data, performed as well as the commercial tool ACD/ChromGenius when constrained to the same small training and test sets. As the availability of RT data increases, we expect the OPERA-RT model's predictive ability will increase.



中文翻译:

三种液相色谱(LC)保留时间预测模型的比较

高分辨率质谱(HRMS)数据通过非目标分析(NTA)彻底改变了对环境污染物的识别。然而,由于通常在环境样品中观察到大量未知的分子特征,因此化学鉴定仍然具有挑战性。需要先进的数据处理技术来改善化学识别工作流程。理想的工作流程汇集了各种数据和工具,以提高识别的确定性。一种这样的工具是色谱保留时间(RT)预测,可用于减少观察到的RT窗口内可能的可疑化学物质的数量。本文比较了三种RT预测模型的相对预测能力和对NTA工作流程的适用性:(1)使用EPI Suite™logP预测的基于logP(辛醇-水分配系数)的模型;(2)市售的ACD / ChromGenius模型;(3)一种新开发的定量结构保留关系模型,称为OPERA-RT。使用具有实验RT数据的78种化合物的相同训练集开发模型,并在19种化合物的相同测试集上评估外部预测性。ACD / ChromGenius和OPERA-RT模型均优于基于EPI Suite™logP的RT模型(R 2分别针对训练测试集= 0.81-0.92、0.86-0.83、0.66-0.69)。此外,OPERA-RT和ACD / ChromGenius都在实验RT的±15%色谱时间范围内预测了95%的RT。根据这些结果,我们使用美国EPA的CompTox化学仪表盘(https://comptox.epa.gov/dashboard)模拟了NTA工作流程,该工作流程具有为已知测试集化学品的公式生成的候选结构的十倍大的清单,使用ACD / ChromGenius和OPERA-RT可以预测所有候选化合物的RT,并评估RT筛选窗口过滤掉不太可能的候选化学物质并增强潜在识别能力。与ACD / ChromGenius相比,OPERA-RT在3分钟的RT窗口中筛选出了更大百分比的候选结构(60%比40%),但保留了更少的已知化学物质(42%比83%)。通过几个指标,使用有限的一组开源数据作为概念验证生成的OPERA-RT模型,在与相同的小型训练和测试集一起使用时,其性能与商业工具ACD / ChromGenius一样。随着RT数据可用性的提高,我们预计OPERA-RT模型的预测能力将会提高。

更新日期:2018-01-11
down
wechat
bug