当前位置: X-MOL 学术Nat. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-confidence structural annotation of metabolites absent from spectral libraries
Nature Biotechnology ( IF 46.9 ) Pub Date : 2021-10-14 , DOI: 10.1038/s41587-021-01045-9
Martin A Hoffmann 1, 2 , Louis-Félix Nothias 3, 4 , Marcus Ludwig 1 , Markus Fleischauer 1 , Emily C Gentry 3 , Michael Witting 5, 6 , Pieter C Dorrestein 3, 7 , Kai Dührkop 1 , Sebastian Böcker 1
Affiliation  

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.



中文翻译:

光谱库中缺少的代谢物的高可信度结构注释

非靶向代谢组学实验依赖于光谱库进行结构注释,但通常只能匹配一小部分光谱。以前的计算机方法在结构数据库中搜索,但无法区分正确和不正确的注释。在这里,我们介绍了 COSMIC 工作流程,它结合了计算机结构数据库生成和注释以及由内核密度P组成的置信度分数值估计和具有强制特征方向性的支持向量机。在不同的数据集上,COSMIC 以低错误发现率注释了大量命中,并且优于光谱库搜索。为了证明 COSMIC 可以注释以前从未报道过的结构,我们注释了 12 种天然胆汁酸。九个结构的注释通过人工评估和两个使用合成标准的结构来确认。在人类样本中,我们注释并手动验证了人类代谢组数据库中目前缺少的 315 个分子结构。将 COSMIC 应用到来自 17,400 个代谢组学实验的数据中,得到了光谱库中不存在的 1,715 个高可信度结构注释。

更新日期:2021-10-14
down
wechat
bug