Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit

Meyer, Fernando; Lesker, Till-Robin; Koslicki, David; Fritz, Adrian; Gurevich, Alexey; Darling, Aaron E.; Sczyrba, Alexander; Bremges, Andreas; McHardy, Alice C.

doi:10.1038/s41596-020-00480-3

Review Article
Published: 01 March 2021

Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit

Nature Protocols volume 16, pages 1785–1801 (2021)Cite this article

6614 Accesses
23 Citations
53 Altmetric
Metrics details

Subjects

Abstract

Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: MetaQUAST assembly benchmarking metrics.**

**Fig. 3: Assessing metagenome cross-sample assembly quality with MetaQUAST for the CAMI II mouse gut dataset.**

**Fig. 4: Assessing genome binners on the gold-standard assembly of the CAMI II mouse gut dataset.**

**Fig. 5: Assessing taxonomic binning results on the CAMI II mouse gut dataset.**

**Fig. 6: Number of high-quality taxon bins predicted from the CAMI II mouse gut dataset for the phylum to species ranks.**

**Fig. 7: Assessing taxonomic profiling results on the CAMI II mouse gut dataset.**

Challenges and best practices in omics benchmarking

Article 12 January 2024

Thomas G. Brooks, Nicholas F. Lahens, … Gregory R. Grant

Critical Assessment of Metagenome Interpretation: the second round of challenges

Article Open access 08 April 2022

Fernando Meyer, Adrian Fritz, … Alice Carolyn McHardy

Reporting guidelines for human microbiome research: the STORMS checklist

Article 17 November 2021

Chloe Mirzayi, Audrey Renson, … Levi Waldron

Data availability

The results of all benchmarked methods and gold standards are available at https://zenodo.org/communities/cami. Links to individual results and DOIs are available in Supplementary Tables 1, 4, 8, and 11. The gold-standard assembly is provided with the CAMI II mouse gut dataset (Table 2). Assembly results and code used to generate Fig. 3 are available at https://github.com/CAMI-challenge/BenchmarkingToolkitTutorial. Genome and taxonomic binning, and taxonomic profiling results used in Figs. 4–7 are available, respectively, in the AMBER and OPAL GitHub repositories at https://github.com/CAMI-challenge/AMBER and https://github.com/CAMI-challenge/OPAL. The code in this paper has been peer-reviewed.

References

Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004).
Article CAS PubMed Google Scholar
Mitchell, A. L. et al. EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res 46, D726–D735 (2018).
Article CAS PubMed Google Scholar
Chen, I.-M. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 47, D666–D677 (2019).
Article CAS PubMed Google Scholar
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
Article CAS PubMed Google Scholar
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).
Article CAS PubMed PubMed Central Google Scholar
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Article CAS PubMed PubMed Central Google Scholar
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
Article CAS PubMed Google Scholar
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bansal, V. & Boucher, C. Sequencing technologies and analyses: where have we been and where are we going? iScience 18, 37–41 (2019).
Article PubMed PubMed Central Google Scholar
Mantere, T., Kersten, S. & Hoischen, A. Long-read sequencing emerging in medical genetics. Front. Genet. 10, 426 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mosimann, S., Meleshko, R. & James, M. N. A critical assessment of comparative molecular modeling of tertiary structures of proteins. Proteins 23, 301–317 (1995).
Article CAS PubMed Google Scholar
Andreoletti, G., Pal, L. R., Moult, J. & Brenner, S. E. Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation. Hum. Mutat. 40, 1197–1201 (2019).
Article PubMed PubMed Central Google Scholar
Dessimoz, C., Škunca, N. & Thomas, P. D. CAFA and the open world of protein function predictions. Trends Genet 29, 609–610 (2013).
Article CAS PubMed Google Scholar
Weber, L. M. et al. Essential guidelines for computational method benchmarking. Genome Biol. 20, 125 (2019).
Article PubMed PubMed Central Google Scholar
Mangul, S. et al. Systematic benchmarking of omics computational tools. Nat. Commun. 10, 1393 (2019).
Article PubMed PubMed Central Google Scholar
Mavromatis, K. et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4, 495–500 (2007).
Article CAS PubMed Google Scholar
Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome analysis tools. Sci. Rep. 6, 19233 (2016).
Article CAS PubMed PubMed Central Google Scholar
McIntyre, A. B. R. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
Article PubMed PubMed Central Google Scholar
Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C. Benchmarking metagenomics tools for taxonomic classification. Cell 178, 779–794 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bremges, A. & McHardy, A. C. Critical Assessment of Metagenome Interpretation enters the second round. mSystems 3, e00103-18 (2018).
Fritz, A. et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 17 (2019).
Article PubMed PubMed Central Google Scholar
Singer, E. et al. Next generation sequencing data of a defined microbial mock community. Sci. Data 3, 160081 (2016).
Article PubMed PubMed Central Google Scholar
Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
Article CAS PubMed Google Scholar
Meyer, F. et al. AMBER: Assessment of Metagenome BinnERs. GigaScience 7, giy069 (2018).
Meyer, F. et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol. 20, 51 (2019).
Article PubMed PubMed Central Google Scholar
Grüning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).
Article PubMed Google Scholar
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Article PubMed Google Scholar
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Article PubMed Google Scholar
Belmann, P. et al. Bioboxes: standardised containers for interchangeable bioinformatics software. Gigascience 4, 47 (2015).
Article PubMed PubMed Central Google Scholar
da Veiga Leprevost, F. et al. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33, 2580–2582 (2017).
Article PubMed PubMed Central Google Scholar
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Article PubMed PubMed Central Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35, D61–D65 (2007).
Article CAS PubMed Google Scholar
McDonald, D. et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience 1, 7 (2012).
Article PubMed PubMed Central Google Scholar
Li, D. et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Article CAS PubMed Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27, 824–834 (2017).
Article CAS PubMed PubMed Central Google Scholar
Mineeva, O., Rojas-Carulla, M., Ley, R. E., Schölkopf, B. & Youngblut, N. D. DeepMAsED: evaluating the quality of metagenomic assemblies. Bioinformatics 36, 3011–3017 (2020).
Article CAS PubMed Google Scholar
Clark, S. C., Egan, R., Frazier, P. I. & Wang, Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 29, 435–443 (2013).
Article CAS PubMed Google Scholar
Kuhring, M., Dabrowski, P. W., Piro, V. C., Nitsche, A. & Renard, B. Y. SuRankCo: supervised ranking of contigs in de novo assemblies. BMC Bioinforma. 16, 240 (2015).
Article Google Scholar
Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).
Article CAS PubMed Google Scholar
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).
Article PubMed PubMed Central Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Article CAS PubMed Google Scholar
Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).
Article CAS PubMed PubMed Central Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Article PubMed PubMed Central Google Scholar
Gregor, I., Dröge, J., Schirmer, M., Quince, C. & McHardy, A. C. PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes. PeerJ 4, e1603 (2016).
Article PubMed PubMed Central Google Scholar
von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 20, 217 (2019).
Article Google Scholar
Huson, D. H. et al. MEGAN Community Edition – interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 12, e1004957 (2016).
Article PubMed PubMed Central Google Scholar
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2020).
Article PubMed PubMed Central Google Scholar
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
Article CAS PubMed Google Scholar
Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014 (2019).
Article PubMed PubMed Central Google Scholar
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
Article Google Scholar
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
Article CAS PubMed Google Scholar
Ciccarelli, F. D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).
Article CAS PubMed Google Scholar
Konstantinidis, K. T. & Tiedje, J. M. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264 (2005).
Article CAS PubMed PubMed Central Google Scholar
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).
Article CAS PubMed Google Scholar
Segata, N. On the road to strain-resolved comparative metagenomics. mSystems 3, e00190-17 (2018).
Quince, C. et al. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol. 18, 181 (2017).
Article PubMed PubMed Central Google Scholar
Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 27, 626–638 (2017).
Article CAS PubMed PubMed Central Google Scholar
Moss, E. L., Maghini, D. G. & Bhatt, A. S. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat. Biotechnol. 38, 701–707 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sajulga, R. et al. Survey of metaproteomics software tools for functional microbiome analysis. PLoS ONE 15, e0241503 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank P. B. Pope for helpful comments. A.E.D.’s contribution was facilitated in part by the Australian Research Council’s Discovery Projects funding scheme (project DP180101506). A.G.’s contribution was facilitated by St. Petersburg State University, Russia (grant ID PURE 51555639).

Author information

Authors and Affiliations

Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
Fernando Meyer, Till-Robin Lesker, Adrian Fritz, Andreas Bremges & Alice C. McHardy
German Center for Infection Research (DZIF), Braunschweig, Germany
Till-Robin Lesker & Andreas Bremges
Computer Science and Engineering, Biology, and The Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
David Koslicki
Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
Alexey Gurevich
The ithree institute, University of Technology Sydney, Sydney, Australia
Aaron E. Darling
Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Alexander Sczyrba

Authors

Fernando Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Till-Robin Lesker
View author publications
You can also search for this author in PubMed Google Scholar
David Koslicki
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Fritz
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Gurevich
View author publications
You can also search for this author in PubMed Google Scholar
Aaron E. Darling
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Sczyrba
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Bremges
View author publications
You can also search for this author in PubMed Google Scholar
Alice C. McHardy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.M. and T.-R.L. performed the experiments; F.M., A.F., T.-R.L., and A.S. prepared the data; A.C.M., A.B., and A.S. conceived the experiments; A.C.M., F.M., and A.B. wrote the manuscript with comments by others; F.M., T.-R.L., D.K., A.F., A.G., A.E.D., A.S., A.B., and A.C.M. interpreted the results, and read and approved the final manuscript.

Corresponding author

Correspondence to Alice C. McHardy.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables 1–13, Supplementary Figs. 1 and 2 and Supplementary Note.

Supplementary Data 1

Supplementary Results: MetaQUAST metrics

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meyer, F., Lesker, TR., Koslicki, D. et al. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat Protoc 16, 1785–1801 (2021). https://doi.org/10.1038/s41596-020-00480-3

Download citation

Received: 10 March 2020
Accepted: 26 November 2020
Published: 01 March 2021
Issue Date: April 2021
DOI: https://doi.org/10.1038/s41596-020-00480-3

This article is cited by

Challenges and best practices in omics benchmarking
- Thomas G. Brooks
- Nicholas F. Lahens
- Gregory R. Grant
Nature Reviews Genetics (2024)
Effective binning of metagenomic contigs using contrastive multi-view representation learning
- Ziye Wang
- Ronghui You
- Shanfeng Zhu
Nature Communications (2024)
MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities
- Ziye Wang
- Pingqin Huang
- Shanfeng Zhu
Genome Biology (2023)
Challenges and opportunities in sharing microbiome data and analyses
- Curtis Huttenhower
- Robert D. Finn
- Alice Carolyn McHardy
Nature Microbiology (2023)
How is Big Data reshaping preclinical aging research?
- Maria Emilia Fernandez
- Jorge Martinez-Romero
- Rafael de Cabo
Lab Animal (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit

Subjects

Abstract

Access options

Similar content being viewed by others

Challenges and best practices in omics benchmarking

Critical Assessment of Metagenome Interpretation: the second round of challenges

Reporting guidelines for human microbiome research: the STORMS checklist

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Supplementary Data 1

Rights and permissions

About this article

Cite this article

This article is cited by

Challenges and best practices in omics benchmarking

Effective binning of metagenomic contigs using contrastive multi-view representation learning

MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities

Challenges and opportunities in sharing microbiome data and analyses

How is Big Data reshaping preclinical aging research?

Comments

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links