Network-based hierarchical population structure analysis for large genomic data sets

  1. Noah A. Rosenberg1
  1. 1Department of Biology, Stanford University, Stanford, California 94305, USA;
  2. 2Department of Computer Science, Ben-Gurion University of the Negev, Be'er-Sheva, 8410501, Israel;
  3. 3Department of Biology, Washington University, St. Louis, Missouri 63130, USA;
  4. 4Department of Evolutionary and Environmental Ecology, University of Haifa, Haifa, 31905, Israel
  • Corresponding author: gili.greenbaum{at}gmail.com
  • Abstract

    Analysis of population structure in natural populations using genetic data is a common practice in ecological and evolutionary studies. With large genomic data sets of populations now appearing more frequently across the taxonomic spectrum, it is becoming increasingly possible to reveal many hierarchical levels of structure, including fine-scale genetic clusters. To analyze these data sets, methods need to be appropriately suited to the challenges of extracting multilevel structure from whole-genome data. Here, we present a network-based approach for constructing population structure representations from genetic data. The use of community-detection algorithms from network theory generates a natural hierarchical perspective on the representation that the method produces. The method is computationally efficient, and it requires relatively few assumptions regarding the biological processes that underlie the data. We show the approach by analyzing population structure in the model plant species Arabidopsis thaliana and in human populations. These examples illustrate how network-based approaches for population structure analysis are well-suited to extracting valuable ecological and evolutionary information in the era of large genomic data sets.

    Footnotes

    • Received March 6, 2019.
    • Accepted November 1, 2019.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server