Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes

  1. Rob Knight1,2,20,21
  1. 1Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA;
  2. 2Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA;
  3. 3Bioinformatics and Systems Biology Program, University of California, San Diego, California 92093, USA;
  4. 4IBM T. J. Watson Research Center, Yorktown Heights, New York 10562, USA;
  5. 5IBM Research Europe, The Hartree Centre, Warrington WA4 4AD, United Kingdom;
  6. 6School of Life Sciences, Arizona State University, Tempe, Arizona 85281, USA;
  7. 7Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, Arizona 85281, USA;
  8. 8Division of Biological Sciences, University of California San Diego, La Jolla, California 92093, USA;
  9. 9IBM Almaden Research Center, San Jose, California 95120, USA;
  10. 10Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki 00271, Finland;
  11. 11Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki 00014, Finland;
  12. 12Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia;
  13. 13Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria 3800, Australia;
  14. 14Department of Internal Medicine, University of Turku, Turku 20014, Finland;
  15. 15Division of Medicine, Turku University Hospital, Turku 20014, Finland;
  16. 16Department of Computing, University of Turku, Turku 20014, Finland;
  17. 17Department of Medicine, University of California, San Diego, California 92093, USA;
  18. 18Department of Pharmacology, University of California, San Diego, California 92093, USA;
  19. 19Department of Public Health and Primary Care, Cambridge University, Cambridge CB2 1TN, United Kingdom;
  20. 20Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, USA;
  21. 21Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, USA
  • Corresponding author: robknight{at}ucsd.edu
  • Abstract

    The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.

    Footnotes

    • Received May 18, 2021.
    • Accepted September 1, 2021.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    This Article

    1. Genome Res. 31: 2131-2137 © 2021 Armstrong et al.; Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server