当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Database-independent molecular formula annotation using Gibbs sampling through ZODIAC
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2020-10-13 , DOI: 10.1038/s42256-020-00234-6
Marcus Ludwig , Louis-Félix Nothias , Kai Dührkop , Irina Koester , Markus Fleischauer , Martin A. Hoffmann , Daniel Petras , Fernando Vargas , Mustafa Morsy , Lihini Aluwihare , Pieter C. Dorrestein , Sebastian Böcker

The confident high-throughput identification of small molecules is one of the most challenging tasks in mass spectrometry-based metabolomics. Annotating the molecular formula of a compound is the first step towards its structural elucidation. Yet even the annotation of molecular formulas remains highly challenging. This is particularly so for large compounds above 500 daltons, and for de novo annotations, for which we consider all chemically feasible formulas. Here we present ZODIAC, a network-based algorithm for the de novo annotation of molecular formulas. Uniquely, it enables fully automated and swift processing of complete experimental runs, providing high-quality, high-confidence molecular formula annotations. This allows us to annotate novel molecular formulas that are absent from even the largest public structure databases. Our method re-ranks molecular formula candidates by considering joint fragments and losses between fragmentation trees. We employ Bayesian statistics and Gibbs sampling. Thorough algorithm engineering ensures fast processing in practice. We evaluate ZODIAC on five datasets, producing results substantially (up to 16.5-fold) better than for several other methods, including SIRIUS, which is the state-of-the-art algorithm for molecular formula annotation at present. Finally, we report and verify several novel molecular formulas annotated by ZODIAC.

A preprint version of the article is available at bioRxiv.


中文翻译:

通过ZODIAC使用Gibbs采样进行数据库独立的分子式注释

小分子的高通量鉴定是基于质谱的代谢组学中最具挑战性的任务之一。注释化合物的分子式是对其结构进行阐明的第一步。然而,甚至分子式的注释仍然具有很高的挑战性。对于超过500道尔顿的大型化合物,尤其是从头开始的注释,我们尤其考虑所有化学上可行的化学式。在这里,我们介绍ZODIAC,这是一种基于网络的分子式从头注释算法。独特的是,它可以对整个实验运行进行全自动和快速处理,并提供高质量,高可信度的分子式注释。这使我们能够注释甚至最大的公共结构数据库中都没有的新颖分子式。我们的方法通过考虑碎裂树之间的联合碎裂和损失来对分子式候选物重新排序。我们采用贝叶斯统计和吉布斯抽样。全面的算法工程确保了实践中的快速处理。我们在五个数据集上评估ZODIAC,所产生的结果比其他几种方法(包括SIRIUS)要好得多(高达16.5倍),而SIRIUS是目前用于分子式注释的最新算法。最后,我们报告并验证了ZODIAC注释的几种新颖分子式。比SIRIUS等其他几种方法要好5倍),SIRIUS是目前分子式注释的最新算法。最后,我们报告并验证了ZODIAC注释的几种新颖分子式。比SIRIUS等其他几种方法要好5倍),SIRIUS是目前分子式注释的最新算法。最后,我们报告并验证了ZODIAC注释的几种新颖分子式。

该文章的预印本可从bioRxiv获得。
更新日期:2020-10-13
down
wechat
bug