当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probabilistic metabolite annotation using retention time prediction and meta-learned projections
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2022-06-07 , DOI: 10.1186/s13321-022-00613-8
Constantino A García 1 , Alberto Gil-de-la-Fuente 1, 2 , Coral Barbas 2 , Abraham Otero 1, 2
Affiliation  

Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of $$39.2 \pm 1.2\; s$$ and $$17.2 \pm 0.9\;s$$ , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt.

中文翻译:

使用保留时间预测和元学习预测的概率代谢物注释

保留时间信息用于代谢组学实验中的代谢物注释。但它的实用性受到代谢组学数据库中实验保留时间数据的可用性以及不同色谱方法之间缺乏重现性的阻碍。准确预测给定色谱方法的保留时间将为代谢物注释提供有价值的支持。我们使用来自 METLIN 小分子保留时间 (SMRT) 数据集的 80、038 个实验保留时间训练了最先进的机器学习回归器。这些模型包括深度神经网络、深度核学习、几个梯度提升模型和混合方法。5, 666 个分子描述符和 2, 214 个指纹 (MACCS166, Extended Connectivity, 和路径指纹指纹)是用 alvaDesc 软件生成的。模型仅使用描述符、指纹和两种类型的特征同时进行训练。贝叶斯超参数搜索用于参数调整。为了在报告性能指标时避免数据泄漏,采用了嵌套交叉验证。最好的结果是通过使用余弦退火暖重启和随机权重平均训练的高度正则化的深度神经网络获得的,平均和中值绝对误差为 $39.2 美元 \pm 1.2\;s$$ 和 $$17.2 \pm 0.9\;s$$ ,分别。据我们所知,这些是迄今为止在 SMRT 数据集上发布的最准确的预测。为了预测色谱方法之间的保留时间,提出了一种新的贝叶斯元学习方法,可以从几个分子中学习。通过在深度神经网络保留时间预测和给定色谱方法之间应用这种投影,我们的方法可以集成到代谢物注释工作流程中,以获得候选注释的 z 分数。为此,只需鉴定给定实验的 10 个分子就足够了(可能通过使用纯代谢物标准品)。z分数的使用允许在对候选人进行排名时考虑投影中的不确定性,而不仅仅是准确性。在这种情况下,我们的结果表明,在 68% 的情况下,正确的分子位于按质量过滤并根据 z 分数排名的前三名候选者中。这显示了此信息对支持代谢物注释的有用性。
更新日期:2022-06-07
down
wechat
bug