当前位置: X-MOL 学术Anal. Bioanal. Chem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning to predict retention time of small molecules in nano-HPLC.
Analytical and Bioanalytical Chemistry ( IF 4.3 ) Pub Date : 2020-08-29 , DOI: 10.1007/s00216-020-02905-0
Sergey Osipenko 1 , Inga Bashkirova 1 , Sergey Sosnin 1 , Oxana Kovaleva 1 , Maxim Fedorov 1 , Eugene Nikolaev 1 , Yury Kostyukevich 1
Affiliation  

Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning–based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.



中文翻译:

机器学习可预测小分子在纳米HPLC中的保留时间。

保留时间是非靶向LC-MS筛选中鉴定的重要参数。精确的保留时间预测有助于注释过程,并且对于蛋白质组学而言是众所周知的。然而,长期缺乏可用的实验信息限制了小分子的预测准确性。最近引入的有关小分子保留时间的大型数据库使基于可靠的机器学习预测化合物的多样性成为可能。应用简单的预测可能会在各种LC系统和条件下扩展这些预测。在我们的工作中 我们描述了一种预测纳米HPLC保留时间的复杂方法,其中包括随后在METLIN小分子数据集上部署的二元和回归梯度增强模型的部署,以及将结果与少量易获得的化合物简单地投影到纳米HPLC分离。提出的模型优于先前使用机器学习进行平均绝对误差为46秒的预测的尝试。就中值绝对(相对)误差而言,转移至纳米LC条件后的总体性能不到155 s(10.8%)。为了说明所描述方法的适用性,我们成功地使用从ROC曲线得出的过滤器阈值成功消除了平均25%至42%的假阳性。从而,

更新日期:2020-10-13
down
wechat
bug