Database-independent molecular formula annotation using Gibbs sampling through ZODIAC

Ludwig, Marcus; Nothias, Louis-Félix; Dührkop, Kai; Koester, Irina; Fleischauer, Markus; Hoffmann, Martin A.; Petras, Daniel; Vargas, Fernando; Morsy, Mustafa; Aluwihare, Lihini; Dorrestein, Pieter C.; Böcker, Sebastian

doi:10.1038/s42256-020-00234-6

Article
Published: 13 October 2020

Database-independent molecular formula annotation using Gibbs sampling through ZODIAC

Nature Machine Intelligence volume 2, pages 629–641 (2020)Cite this article

1569 Accesses
87 Citations
30 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 21 October 2020

This article has been updated

A preprint version of the article is available at bioRxiv.

The confident high-throughput identification of small molecules is one of the most challenging tasks in mass spectrometry-based metabolomics. Annotating the molecular formula of a compound is the first step towards its structural elucidation. Yet even the annotation of molecular formulas remains highly challenging. This is particularly so for large compounds above 500 daltons, and for de novo annotations, for which we consider all chemically feasible formulas. Here we present ZODIAC, a network-based algorithm for the de novo annotation of molecular formulas. Uniquely, it enables fully automated and swift processing of complete experimental runs, providing high-quality, high-confidence molecular formula annotations. This allows us to annotate novel molecular formulas that are absent from even the largest public structure databases. Our method re-ranks molecular formula candidates by considering joint fragments and losses between fragmentation trees. We employ Bayesian statistics and Gibbs sampling. Thorough algorithm engineering ensures fast processing in practice. We evaluate ZODIAC on five datasets, producing results substantially (up to 16.5-fold) better than for several other methods, including SIRIUS, which is the state-of-the-art algorithm for molecular formula annotation at present. Finally, we report and verify several novel molecular formulas annotated by ZODIAC.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Molecular formula annotation error rates.**

**Fig. 2: Percentage of correct annotations and number of compounds in relation to ZODIAC score.**

**Fig. 3: Annotation of a novel bromine-containing compound in the diatoms dataset.**

**Fig. 4: Annotation of a novel chlorine- and iodine-containing compound in the diatoms dataset.**

**Fig. 5: Running time comparison of SIRIUS and ZODIAC on five datasets.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

John Jumper, Richard Evans, … Demis Hassabis

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Srinivas Niranj Chandrasekaran, Beth A. Cimini, … Anne E. Carpenter

An open source knowledge graph ecosystem for the life sciences

Article Open access 11 April 2024

Tiffany J. Callahan, Ignacio J. Tripodi, … Lawrence E. Hunter

Data availability

Input mzML/mzXML files for the five datasets are available at MassIVE (https://massive.ucsd.edu/), with the following accession numbers for dendroides (MSV000080502), for NIST1950 (MSV000081364), for tomato (MSV000081463), for diatoms (MSV000081731) and for the mice stool (MSV000079949) datasets. SIRIUS and ZODIAC results and a virtual machine on which to reproduce the data are available from https://bio.informatik.uni-jena.de/data/ and https://doi.org/10.6084/m9.figshare.12911171. Source data are provided with this paper.

Code availability

ZODIAC has been integrated into the SIRIUS software and is written in Java. It is open source under the GNU General Public License (version 3), and works on Windows, macOS X and Linux. A command-line version allows batch processing and results can be visualized in a graphical user interface. We provide executable binaries, example files and additional information on the ZODIAC website (https://bio.informatik.uni-jena.de/software/zodiac/). A source copy is hosted on GitHub (https://github.com/boecker-lab/sirius-libs⁶⁰); the branch ‘zodiac_in_sirius_4_release’ contains the SIRIUS and ZODIAC code used for evaluation in this paper.

Change history

21 October 2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucl. Acids Res. 46, D608–D617 (2018).
Google Scholar
Kim, S. et al. PubChem substance and compound databases. Nucl. Acids Res. 44, D1202–D1213 (2016).
Google Scholar
Ruttkies, C., Schymanski, E. L., Wolf, S., Hollender, J. & Neumann, S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J. Cheminform. 8, 3 (2016).
Google Scholar
Allen, F., Greiner, R. & Wishart, D. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics 11, 98–110 (2015).
Google Scholar
Schymanski, E. L. et al. Critical assessment of small molecule identification 2016: automated methods. J. Cheminform. 9, 22 (2017).
Google Scholar
Samaraweera, M. A., Hall, L. M., Hill, D. W. & Grant, D. F. Evaluation of an artificial neural network retention index model for chemical structure identification in nontargeted metabolomics. Anal. Chem. 90, 12752–12760 (2018).
Google Scholar
Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).
Google Scholar
Dührkop, K. et al. Classes for the masses: systematic classification of unknowns using fragmentation spectra. Preprint at https://www.biorxiv.org/content/10.1101/2020.04.17.046672v1 (2020).
Kind, T. & Fiehn, O. Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinform. 8, 105 (2007).
Google Scholar
Stein, S. E. Mass spectral reference libraries: an ever-expanding resource for chemical identification. Anal. Chem. 84, 7274–7282 (2012).
Google Scholar
Vinaixa, M. et al. Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects. Trends Anal. Chem. 78, 23–35 (2016).
Google Scholar
Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010).
Google Scholar
da Silva, R. R., Dorrestein, P. C. & Quinn, R. A. Illuminating the dark matter in metabolomics. Proc. Natl Acad. Sci. USA 112, 12549–12550 (2015).
Google Scholar
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Google Scholar
Guijas, C. et al. METLIN: a technology platform for identifying knowns and unknowns. Anal. Chem. 90, 3156–3164 (2018).
Google Scholar
Alon, T. & Amirav, A. Isotope abundance analysis methods and software for improved sample identification with supersonic gas chromatography/mass spectrometry. Rapid Commun. Mass Spectrom. 20, 2579–2588 (2006).
Google Scholar
Böcker, S., Letzel, M., Lipták, Z. S. & Pervukhin, A. Decomposing metabolomic isotope patterns. In Proc. Works. Algorithms in Bioinformatics (WABI 2006) Vol. 4175,12–23 (Springer, Berlin, 2006).
Ojanperä, S. et al. Isotopic pattern and accurate mass determination in urine drug screening by liquid chromatography/time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 20, 1161–1167 (2006).
Google Scholar
Böcker, S., Letzel, M., Lipták, Zs & Pervukhin, A. SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 25, 218–224 (2009).
Google Scholar
Pluskal, T., Uehara, T. & Yanagida, M. Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. Anal. Chem. 84, 4396–4403 (2012).
Google Scholar
Valkenborg, D., Mertens, I., Lemière, F., Witters, E. & Burzykowski, T. The isotopic distribution conundrum. Mass Spectrom. Rev. 31, 96–109 (2012).
Google Scholar
Loos, M., Gerber, C., Corona, F., Hollender, J. & Singer, H. Accelerated isotope fine structure calculation using pruned transition trees. Anal. Chem. 87, 5738–5744 (2015).
Google Scholar
Böcker, S. & Rasche, F. Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics 24, i49–Ii55 (2008).
Stravs, M. A., Schymanski, E. L., Singer, H. P. & Hollender, J. Automatic recalibration and processing of tandem mass spectra using formula annotation. J. Mass Spectrom. 48, 89–99 (2013).
Google Scholar
Böcker, S. & Dührkop, K. Fragmentation trees reloaded. J. Cheminform. 8, 5 (2016).
MATH Google Scholar
Rogers, S., Scheltema, R. A., Girolami, M. & Breitling, R. Probabilistic assignment of formulas to mass peaks in metabolomics experiments. Bioinformatics 25, 512–518 (2009).
Google Scholar
Daly, R. et al. MetAssign: probabilistic annotation of metabolites from LC-MS data using a Bayesian clustering approach. Bioinformatics 30, 2764–2771 (2014).
Google Scholar
da Silva, R. R. et al. ProbMetab: an R package for Bayesian probabilistic annotation of LC-MS-based metabolomics. Bioinformatics 30, 1336–1337 (2014).
Google Scholar
Del Carratore, F. et al. Integrated probabilistic annotation: a Bayesian-based annotation method for metabolomic profiles integrating biochemical connections, isotope patterns, and adduct relationships. Anal. Chem. 91, 12799–12807 (2019).
Google Scholar
Tziotis, D., Hertkorn, N. & Schmitt-Kopplin, P. Kendrick-analogous network visualisation of ion cyclotron resonance Fourier transform mass spectra: improved options for the assignment of elemental compositions and the classification of organic molecular complexity. Eur. J. Mass Spectrom. 17, 415–421 (2011).
Google Scholar
Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. USA 109, E1743–E1752 (2012).
Google Scholar
Morreel, K. et al. Systematic structural characterization of metabolites in Arabidopsis via candidate substrate-product pair networks. Plant Cell 26, 929–945 (2014).
Google Scholar
Quinn, R. A. et al. Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579, 123–129 (2020).
Google Scholar
Esposito, M. et al. Euphorbia dendroides latex as a source of jatrophane esters: isolation, structural analysis, conformational study, and anti-CHIKV activity. J. Natural Prod. 79, 2873–2882 (2016).
Google Scholar
Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016).
Google Scholar
Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015).
Google Scholar
Nothias, L.-F. et al. Bioactivity-based molecular networking for the discovery of drug leads in natural product bioassay-guided fractionation. J. Natural Prod. 81, 758–767 (2018).
Google Scholar
Pence, H. E. & Williams, A. ChemSpider: an online chemical information resource. J. Chem. Ed. 87, 1123–1124 (2010).
Google Scholar
Pluskal, T., Castillo, S., Villar-Briones, A. & Oresic, M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform. 11, 395 (2010).
Google Scholar
Nothias, L. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
Google Scholar
Simón-Manso, Y. et al. Metabolite profiling of a NIST standard reference material for human plasma (SRM 1950): GC-MS, LC-MS, NMR, and clinical laboratory analyses, libraries, and web-based resources. Anal. Chem. 85, 11725–11731 (2013).
Google Scholar
Vos, R. C. H. D. et al. Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nat. Protocols 2, 778–791 (2007).
Google Scholar
Agarwal, V. et al. Complexity of naturally produced polybrominated diphenyl ethers revealed via mass spectrometry. Environ. Sci. Technol. 49, 1339–46 (2015).
Google Scholar
Andersen, R. & America, P. S. Algal Culturing Techniques (Elsevier Science, 2005).
Dittmar, T., Koch, B., Hertkorn, N. & Kattner, G. A simple and efficient method for the solid-phase extraction of dissolved organic matter (SPE-DOM) from seawater. Limnol. Oceanogr. Meth. 6, 230–235 (2008).
Google Scholar
Petras, D. et al. High-resolution liquid chromatography tandem mass spectrometry enables large scale molecular characterization of dissolved organic matter. Front. Mar. Sci. 4, 405 (2017).
Google Scholar
Meusel, M. et al. Predicting the presence of uncommon elements in unknown biomolecules from isotope patterns. Anal. Chem. 88, 7556–7566 (2016).
Google Scholar
Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis. Metabolomics 3, 211–221 (2007).
Google Scholar
Karp, R. M. in Complexity of Computer Computations (eds Miller, R. E. & Thatcher, J. W.) 85–103 (Plenum Press, 1972).
Downey, R. G. & Fellows, M. R. Parameterized Complexity (Springer, Berlin, 1999).
Zuckerman, D. Linear degree extractors and the inapproximability of max clique and chromatic number. In Proc. ACM Symp. on Theory of Computing (STOC 2006) 681–690 (2006).
Chen, J., Huang, X., Kanj, I. A. & Xia, G. Strong computational lower bounds via parameterized complexity. J. Comp. Syst. Sci. 72, 1346–1367 (2006).
MathSciNet MATH Google Scholar
Impagliazzo, R. & Paturi, R. On the complexity of k-SAT. J. Comp. Syst. Sci. 62, 367–375 (2001).
MathSciNet MATH Google Scholar
Rasche, F. et al. Identifying the unknowns by aligning fragmentation trees. Anal. Chem. 84, 3417–3426 (2012).
Google Scholar
Geman, S. & Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984).
MATH Google Scholar
Ludwig, M., Dührkop, K. & Böcker, S. Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints. Bioinformatics 34, i333–i340 (2018).
Li, L. et al. MyCompoundID: using an evidence-based metabolome library for metabolite identification. Anal. Chem. 85, 3401–3408 (2013).
Google Scholar
Meringer, M., Reinker, S., Zhang, J. & Muller, A. MS/MS data improves automated determination of molecular formulas by mass spectrometry. MATCH Commun. Math. Comput. Chem. 65, 259–290 (2011).
Google Scholar
Heuerding, S. & Clerc, J. T. Simple tools for the computer-aided interpretation of mass spectra. Chemometr. Intell. Lab. Syst. 20, 57–69 (1993).
Google Scholar
Dührkop, K. et al. boecker-lab/sirius-libs: SIRIUS 4.0.1 including ZODIAC (Version v4.0.1_with_ZODIAC). https://doi.org/10.5281/zenodo.3985859 (2020).

Download references

Acknowledgements

We thank M. Witting for discussions and F. Kretschmer for the fragmentation tree visualization. We acknowledge financial support by the Deutsche Forschungsgemeinschaft to S.B., K.D., M.F., M.A.H. and M.L. (grant BO 1910/20) and D.P. (grant PE 2600/1). I.K. acknowledges funding from the Blasker Environmental Grant, San Diego Foundation. F.V. was funded by the Department of Navy, Office of Naval Research Multidisciplinary University Research Initiative (MURI) Award (award number N00014-15-1-2809). L.-F.N. was supported by European Union’s Horizon 2020 grants (MSCA-GF, 704786). M.M. acknowledges funding from the National Science Foundation (award number 1354050). We acknowledge financial support by the US National Institutes of Health to P.C.D. for the Center for Computational Mass Spectrometry (grant P41 GM103484), the re-use of metabolomics data (grant R03 CA211211) and the tools for rapid and accurate structure elucidation of natural products (grant R01 GM107550). P.C.D. also acknowledges support from the Sloan Foundation and from the Gordon and Betty Moore Foundation.

Author information

Authors and Affiliations

Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
Marcus Ludwig, Kai Dührkop, Markus Fleischauer, Martin A. Hoffmann & Sebastian Böcker
Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
Louis-Félix Nothias, Irina Koester, Daniel Petras & Pieter C. Dorrestein
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
Louis-Félix Nothias, Daniel Petras, Fernando Vargas & Pieter C. Dorrestein
Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Irina Koester, Daniel Petras & Lihini Aluwihare
International Max Planck Research School ‘Exploration of Ecological Interactions with Molecular and Chemical Techniques’, Max Planck Institute for Chemical Ecology, Jena, Germany
Martin A. Hoffmann
Division of Biological Science, University of California San Diego, La Jolla, CA, USA
Fernando Vargas
Department of Biological and Environmental Sciences, University of West Alabama, Livingston, AL, USA
Mustafa Morsy

Authors

Marcus Ludwig
View author publications
You can also search for this author in PubMed Google Scholar
Louis-Félix Nothias
View author publications
You can also search for this author in PubMed Google Scholar
Kai Dührkop
View author publications
You can also search for this author in PubMed Google Scholar
Irina Koester
View author publications
You can also search for this author in PubMed Google Scholar
Markus Fleischauer
View author publications
You can also search for this author in PubMed Google Scholar
Martin A. Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Petras
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Vargas
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Morsy
View author publications
You can also search for this author in PubMed Google Scholar
Lihini Aluwihare
View author publications
You can also search for this author in PubMed Google Scholar
Pieter C. Dorrestein
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Böcker
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B. designed the research. S.B. and M.L. developed the computational method with help from K.D. M.L. implemented the computational method with contributions from K.D. and M.F. M.L. and L.-F.N. performed the method evaluation, coordinated by S.B. L.-F.N., I.K. and L.A. contributed to the interpretation of results. M.F. and M.L. integrated ZODIAC into SIRIUS. M.A.H. contributed to the visualization of the novel compound’s data. Mass spectrometry experiments were performed for the dendroides dataset by L.F.N., for the NIST1950 dataset by F.V., for the tomato dataset by L.-F.N and M.M. and for the diatoms dataset by I.K. and D.P. L.A. and P.C.D. coordinated the experimental part of the study. S.B. and M.L. wrote the manuscript, to which L.F.N. and I.K. contributed, in cooperation with all other authors.

Corresponding author

Correspondence to Sebastian Böcker.

Ethics declarations

Competing interests

S.B, K.D., M.F., M.A.H. and M.L. are founders of Bright Giant GmbH. P.C.D. is the scientific advisor for Sirenas LLC.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Statistics on compounds with annotated ground truth molecular formulas.

Given is the number of total compounds, the number of compounds with a ground truth molecular formula and the number which are in the top 50 of SIRIUS-ranked candidates. The median m/z and 25 and 75 percentile considers only candidates in the top 50. We report the maximum absolute value of all relative mass errors in a dataset. Finally, sample standard deviations (STD) of relative mass errors are computed assuming a mean mass error of zero.

Extended Data Fig. 2 Distribution of compound masses.

Distribution of precursor ion m/z of the compounds used as ground truth for the evaluation of the molecular formula annotation on the five datasets. Bins of width 100 are centred at 100, 200, …, 800 m/z.

Source data

Extended Data Fig. 3 ZODIAC processing and evaluation workflow.

(1) Each LC-MS/MS run is processed individually; input mzML/mzXML files are processed using OpenMS, performing feature and adduct detection and producing files in SIRIUS input format. Resulting features combine MS1, MS/MS and adduct information. (2), (3) Filtering is performed on feature, MS/MS and peak level. (4) Similar features are merged between different runs using hierarchical clustering; MS/MS are combined and a best isotope pattern is selected per feature. (5) Missing isotope peaks are searched in MS1 spectra to extend isotope patterns. (6) A final feature filtering step is performed; the remaining features are considered as compounds. (7) SIRIUS is executed. (8) Compounds with few explained peaks are discarded, since a badly explained MS/MS spectrum indicates low quality. (9) ZODIAC is run on the remaining compounds. (10) SIRIUS and ZODIAC are evaluated on the same set of compounds.

Extended Data Fig. 4 Molecular formula annotation error rates.

Error rates on five datasets. Methods are SIRIUS; ZODIAC (without anchors); exact mass over elements carbon, hydrogen, nitrogen and oxygen (‘exact mass (CHNO)’); exact mass over CHNO plus phosphorus and sulfur (‘exact mass (CHNOPS)’); Seven Golden Rules with elements CHNOPS (‘7GR (CHNOPS)’); Seven Golden Rules with elements CHNOPS plus bromine and chlorine (‘7GR (CHNOPSBrCl)’); and GenForm. Between 44 an 271 compounds were processed per dataset, see Extended Data Fig. 1 for details. GenForm is the only publicly available tool for molecular formula inference besides SIRIUS, and considers both the isotope pattern and the fragmentation spectrum. GenForm was restricted to elements CHNOPS, and 7GR (CHNOPSBrCl) cannot annotate iodine-containing compounds; to this end, only SIRIUS and ZODIAC are in theory capable of annotating the two novel molecular formulas C₂₄H₄₇BrNO₈P and C₁₅H₃₀ClIO₅ reported here. Error rates are based on all compounds with established ground truth, resulting in slightly higher error rates for SIRIUS and ZODIAC on dendroides, tomato and mice stool compared to Fig. 1. Error rates on the five datasets agree well with the mass of compounds in the respective dataset, see Extended Data Fig. 1: larger compounds result in substantially more candidates to be considered, in particular for a larger set of elements, and result in worse annotation rates. For evaluation details see the Methods section.

Source data

Extended Data Fig. 5 Seven Golden Rules applied to annotated molecular formulas.

For each ZODIAC molecular formula annotation, we test whether it meets the molecular formula subset of the Seven Golden Rules (7GR). Each dot represents one annotated compound; molecular formulas are sorted by ZODIAC score.

Source data

Extended Data Fig. 6 Novel molecular formulas.

All molecular formulas are absent from the largest molecular structure databases PubChem and ChemSpider. Only molecular formula annotations with a minimum ZODIAC score of 0.98 are reported such that at least 95% of the MS/MS spectrum intensity is being explained by the SIRIUS fragmentation tree, and at least one molecular formula of the compound is connected to 5 or more compounds. There may be more than one hypothetical compound in an LC-MS run being annotated with one molecular formula, potentially corresponding to different isomers. For such cases, ‘#comp.’ is the number of hypothetical compounds being annotated with the given molecular formula, and ‘max score’ is the maximum ZODIAC score among these annotations. The corresponding compounds are given in Supplementary Table 5. For 90.00% of the compounds, SIRIUS top-ranks the same molecular formula.

Supplementary information

Supplementary Information

Supplementary Figs. 1–7, Supplementary Table 4, Supplementary Note 1.

Reporting Summary

Supplementary Table 1

Manually annotated molecular formulas for compounds in the dendroides dataset. These molecular formulas serve as ground truth for evaluation of SIRIUS and ZODIAC.

Supplementary Table 2

Spectral library hits for datasets NIST1950, tomato, diatoms and mice stool. The molecular formulas of these library hits serve as ground truth for evaluation of SIRIUS and ZODIAC.

Supplementary Table 3

List of input files used for evaluation of five datasets. The included files in mzML/mzXML format correspond to LC-MS/MS runs which were used for evaluation. These runs are subsets of the data provided at MassIVE repository.

Supplementary Table 5

Compounds with a novel molecular formula. Provided are the detailed information for compounds corresponding to the novel molecular formulas in Extended Data Fig. 6. All molecular formulas are absent from the largest molecular structure databases PubChem and ChemSpider.

Source data

Source Data Fig. 1

Statistical source data

Source Data Fig. 2

Statistical source data

Source Data Fig. 3

Mass spectra

Source Data Fig. 4

Mass spectra

Source Data Fig. 5

Statistical source data

Source Data Extended Data Fig. 2

Statistical source data

Source Data Extended Data Fig. 4

Statistical source data

Source Data Extended Data Fig. 5

Statistical source data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ludwig, M., Nothias, LF., Dührkop, K. et al. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat Mach Intell 2, 629–641 (2020). https://doi.org/10.1038/s42256-020-00234-6

Download citation

Received: 01 April 2020
Accepted: 04 September 2020
Published: 13 October 2020
Issue Date: October 2020
DOI: https://doi.org/10.1038/s42256-020-00234-6

This article is cited by

A conserved interdomain microbial network underpins cadaver decomposition despite environmental variables
- Zachary M. Burcham
- Aeriel D. Belk
- Jessica L. Metcalf
Nature Microbiology (2024)
Progress and challenges in exploring aquatic microbial communities using non-targeted metabolomics
- Monica Thukral
- Andrew E Allen
- Daniel Petras
The ISME Journal (2023)
Annotating metabolite mass spectra with domain-inspired chemical formula transformers
- Samuel Goldman
- Jeremy Wohlwend
- Connor W. Coley
Nature Machine Intelligence (2023)
Ion mobility mass spectrometry for the study of mycobacterial mycolic acids
- Yi Liu
- Nadhira Kaffah
- Gerald Larrouy-Maumus
Scientific Reports (2023)
BUDDY: molecular formula discovery via bottom-up MS/MS interrogation
- Shipei Xing
- Sam Shen
- Tao Huan
Nature Methods (2023)

Subjects

Access options

Similar content being viewed by others

Data availability

Code availability

Change history

21 October 2020

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links