Balancing selection maintains hyper-divergent haplotypes in Caenorhabditis elegans

Lee, Daehan; Zdraljevic, Stefan; Stevens, Lewis; Wang, Ye; Tanny, Robyn E.; Crombie, Timothy A.; Cook, Daniel E.; Webster, Amy K.; Chirakar, Rojin; Baugh, L. Ryan; Sterken, Mark G.; Braendle, Christian; Félix, Marie-Anne; Rockman, Matthew V.; Andersen, Erik C.

doi:10.1038/s41559-021-01435-x

Article
Published: 05 April 2021

Balancing selection maintains hyper-divergent haplotypes in Caenorhabditis elegans

Nature Ecology & Evolution volume 5, pages 794–807 (2021)Cite this article

4281 Accesses
51 Citations
35 Altmetric
Metrics details

Subjects

Abstract

Across diverse taxa, selfing species have evolved independently from outcrossing species thousands of times. The transition from outcrossing to selfing decreases the effective population size, effective recombination rate and heterozygosity within a species. These changes lead to a reduction in genetic diversity, and therefore adaptive potential, by intensifying the effects of random genetic drift and linked selection. Within the nematode genus Caenorhabditis, selfing has evolved at least three times, and all three species, including the model organism Caenorhabditis elegans, show substantially reduced genetic diversity relative to outcrossing species. Selfing and outcrossing Caenorhabditis species are often found in the same niches, but we still do not know how selfing species with limited genetic diversity can adapt to these environments. Here, we examine the whole-genome sequences from 609 wild C. elegans strains isolated worldwide and show that genetic variation is concentrated in punctuated hyper-divergent regions that cover 20% of the C. elegans reference genome. These regions are enriched in environmental response genes that mediate sensory perception, pathogen response and xenobiotic stress response. Population genomic evidence suggests that genetic diversity in these regions has been maintained by long-term balancing selection. Using long-read genome assemblies for 15 wild strains, we show that hyper-divergent haplotypes contain unique sets of genes and show levels of divergence comparable to levels found between Caenorhabditis species that diverged millions of years ago. These results provide an example of how species can avoid the evolutionary dead end associated with selfing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Genetically divergent wild *C. elegans* strains isolated from the Pacific region.**

**Fig. 2: Characterization of hyper-divergent regions at the isotype level.**

**Fig. 3: Punctuated hyper-divergent genomic regions are widespread across the *C. elegans* species.**

**Fig. 4: Balancing selection has maintained hyper-divergent haplotypes enriched in environmental response genes.**

**Fig. 5: Hyper-divergent haplotypes contain ancient genetic diversity.**

Effect of recombination on genetic diversity of Caenorhabditis elegans

Article Open access 30 September 2023

Antagonistic pleiotropy conceals molecular adaptations in changing environments

Article 10 February 2020

High-resolution lineage tracking reveals travelling wave of adaptation in laboratory yeast

Article 13 November 2019

Data availability

The raw short-read sequencing reads for the strains used in this project are available from the NCBI Sequence Read Archive (project PRJNA549503). The raw PacBio long-read data, along with the de novo assemblies and gene predictions, are available from the NCBI Sequence Read Archive (project PRJNA692613). Strain information and short-read genomic variation data are available from the CeNDR (www.elegansvariation.org)⁶⁸.

Code availability

All datasets and code for generating the figures and tables are available from GitHub (https://github.com/AndersenLab/Ce-328pop-div).

References

Barrett, S. C. H. The evolution of plant sexual diversity. Nat. Rev. Genet. 3, 274–284 (2002).
Article CAS PubMed Google Scholar
Cutter, A. D. Reproductive transitions in plants and animals: selfing syndrome, sexual selection and speciation. New Phytol. 224, 1080–1094 (2019).
Article PubMed Google Scholar
Pollak, E. On the theory of partially inbreeding finite populations. I. Partial selfing. Genetics 117, 353–360 (1987).
Article CAS PubMed PubMed Central Google Scholar
Kaplan, N. L., Hudson, R. R. & Langley, C. H. The ‘hitchhiking effect’ revisited. Genetics 123, 887–899 (1989).
Article CAS PubMed PubMed Central Google Scholar
Charlesworth, D. & Charlesworth, B. Quantitative genetics in plants: the effect of the breeding system on genetic variability. Evolution 49, 911–920 (1995).
Article CAS PubMed Google Scholar
Baker, H. G. Self-compatibility and establishment after ‘long-distance’ dispersal. Evolution 9, 347–349 (1955).
Google Scholar
Baker, H. G. Support for Baker’s law—as a rule. Evolution 21, 853–856 (1967).
Article PubMed Google Scholar
Charlesworth, D. & Wright, S. I. Breeding systems and genome evolution. Curr. Opin. Genet. Dev. 11, 685–690 (2001).
Article CAS PubMed Google Scholar
Stebbins, G. L. Self fertilization and population variability in the higher plants. Am. Nat. 91, 337–354 (1957).
Article Google Scholar
Andersen, E. C. et al. Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat. Genet. 44, 285–290 (2012).
Article CAS PubMed PubMed Central Google Scholar
Cutter, A. D., Baird, S. E. & Charlesworth, D. High nucleotide polymorphism and rapid decay of linkage disequilibrium in wild populations of Caenorhabditis remanei. Genetics 174, 901–913 (2006).
Article CAS PubMed PubMed Central Google Scholar
Dey, A., Chan, C. K. W., Thomas, C. G. & Cutter, A. D. Molecular hyperdiversity defines populations of the nematode Caenorhabditis brenneri. Proc. Natl Acad. Sci. USA 110, 11056–11060 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kiontke, K. et al. Caenorhabditis phylogeny predicts convergence of hermaphroditism and extensive intron loss. Proc. Natl Acad. Sci. USA 101, 9003–9008 (2004).
Article CAS PubMed PubMed Central Google Scholar
Sivasundar, A. & Hey, J. Population genetics of Caenorhabditis elegans: the paradox of low polymorphism in a widespread species. Genetics 163, 147–157 (2003).
Article CAS PubMed PubMed Central Google Scholar
Barrière, A. & Félix, M.-A. High local genetic diversity and low outcrossing rate in Caenorhabditis elegans natural populations. Curr. Biol. 15, 1176–1184 (2005).
Article PubMed CAS Google Scholar
Félix, M.-A. & Duveau, F. Population dynamics and habitat sharing of natural populations of Caenorhabditis elegans and C. briggsae. BMC Biol. 10, 59 (2012).
Article PubMed PubMed Central CAS Google Scholar
Schulenburg, H. & Félix, M.-A. The natural biotic environment of Caenorhabditis elegans. Genetics 206, 55–86 (2017).
Article CAS PubMed PubMed Central Google Scholar
Crombie, T. A. et al. Deep sampling of Hawaiian Caenorhabditis elegans reveals high genetic diversity and admixture with global populations. eLife 8, e50465 (2019).
Article CAS PubMed PubMed Central Google Scholar
Andrés, A. M. et al. Targets of balancing selection in the human genome. Mol. Biol. Evol. 26, 2755–2764 (2009).
Article PubMed PubMed Central CAS Google Scholar
Amambua-Ngwa, A. et al. Population genomic scan for candidate signatures of balancing selection to guide antigen characterization in malaria parasites. PLoS Genet. 8, e1002992 (2012).
Article CAS PubMed PubMed Central Google Scholar
Siewert, K. M. & Voight, B. F. Detecting long-term balancing selection using allele frequency correlation. Mol. Biol. Evol. 34, 2996–3005 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wu, Q. et al. Long-term balancing selection contributes to adaptation in Arabidopsis and its relatives. Genome Biol. 18, 217 (2017).
Article PubMed PubMed Central CAS Google Scholar
Koenig, D. et al. Long-term balancing selection drives evolution of immunity genes in Capsella. eLife 8, e43606 (2019).
Article PubMed PubMed Central Google Scholar
Langley, C. H. et al. Genomic variation in natural populations of Drosophila melanogaster. Genetics 192, 533–598 (2012).
Article CAS PubMed PubMed Central Google Scholar
Leffler, E. M. et al. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science 339, 1578–1582 (2013).
Article CAS PubMed PubMed Central Google Scholar
Charlesworth, D. Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. 2, e64 (2006).
Article PubMed PubMed Central CAS Google Scholar
Nordborg, M., Charlesworth, B. & Charlesworth, D. Increased levels of polymorphism surrounding selectively maintained sites in highly selling species. Proc. R. Soc. Lond. Ser. B Biol. Sci. 263, 1033–1039 (1996).
Article Google Scholar
Wiuf, C., Zhao, K., Innan, H. & Nordborg, M. The probability and chromosomal extent of trans-specific polymorphism. Genetics 168, 2363–2372 (2004).
Article PubMed PubMed Central Google Scholar
Seidel, H. S., Rockman, M. V. & Kruglyak, L. Widespread genetic incompatibility in C. elegans maintained by balancing selection. Science 319, 589–594 (2008).
Article CAS PubMed PubMed Central Google Scholar
Greene, J. S. et al. Balancing selection shapes density-dependent foraging behaviour. Nature 539, 254–258 (2016).
Article PubMed PubMed Central CAS Google Scholar
Van Sluijs, L. et al. Balancing selection shapes the intracellular pathogen response in natural Caenorhabditis elegans populations. Preprint at bioRxiv https://doi.org/10.1101/579151 (2019).
Thompson, O. A. et al. Remarkably divergent regions punctuate the genome assembly of the Caenorhabditis elegans Hawaiian strain CB4856. Genetics 200, 975–989 (2015).
Article PubMed PubMed Central CAS Google Scholar
Kim, C. et al. Long-read sequencing reveals intra-species tolerance of substantial structural variations and new subtelomere formation in C. elegans. Genome Res. 29, 1023–1035 (2019).
Article CAS PubMed PubMed Central Google Scholar
Richaud, A., Zhang, G., Lee, D., Lee, J. & Félix, M.-A. The local coexistence pattern of selfing genotypes in Caenorhabditis elegans natural metapopulations. Genetics 208, 807–821 (2018).
Article CAS PubMed Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Rockman, M. V. & Kruglyak, L. Recombinational landscape and population genomics of Caenorhabditis elegans. PLoS Genet. 5, e1000419 (2009).
Article PubMed PubMed Central CAS Google Scholar
Rockman, M. V., Skrovanek, S. S. & Kruglyak, L. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science 330, 372–376 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cutter, A. D. & Payseur, B. A. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat. Rev. Genet. 14, 262–274 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gimond, C. et al. Outbreeding depression with low genetic variation in selfing Caenorhabditis nematodes. Evolution 67, 3087–3101 (2013).
Article PubMed Google Scholar
Cutter, A. D., Morran, L. T. & Phillips, P. C. Males, outcrossing, and sexual selection in Caenorhabditis nematodes. Genetics 213, 27–57 (2019).
Article PubMed PubMed Central Google Scholar
Barrett, R. D. H. & Schluter, D. Adaptation from standing genetic variation. Trends Ecol. Evol. 23, 38–44 (2008).
Article PubMed Google Scholar
Schulenburg, H., Hoeppner, M. P., Weiner, J. 3rd & Bornberg-Bauer, E. Specificity of the innate immune system and diversity of C-type lectin domain (CTLD) proteins in the nematode Caenorhabditis elegans. Immunobiology 213, 237–250 (2008).
Article CAS PubMed Google Scholar
Reddy, K. C. et al. An intracellular pathogen response pathway promotes proteostasis in C. elegans. Curr. Biol. 27, 3544–3553.e5 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bakowski, M. A. et al. Ubiquitin-mediated response to microsporidia and virus infection in C. elegans. PLoS Pathog. 10, e1004200 (2014).
Article PubMed PubMed Central CAS Google Scholar
Chang, H. C., Paek, J. & Kim, D. H. Natural polymorphisms in C. elegans HECW-1 E3 ligase affect pathogen avoidance behaviour. Nature 480, 525–529 (2011).
Article CAS PubMed PubMed Central Google Scholar
Troemel, E. R., Félix, M.-A., Whiteman, N. K., Barrière, A. & Ausubel, F. M. Microsporidia are natural intracellular parasites of the nematode Caenorhabditis elegans. PLoS Biol. 6, 2736–2752 (2008).
Article CAS PubMed Google Scholar
Félix, M.-A. et al. Natural and experimental infection of Caenorhabditis nematodes by novel viruses related to nodaviruses. PLoS Biol. 9, e1000586 (2011).
Article PubMed PubMed Central CAS Google Scholar
Chen, K., Franz, C. J., Jiang, H., Jiang, Y. & Wang, D. An evolutionarily conserved transcriptional response to viral infection in Caenorhabditis nematodes. BMC Genom. 18, 303 (2017).
Article CAS Google Scholar
Balla, K. M., Andersen, E. C., Kruglyak, L. & Troemel, E. R. A wild C. elegans strain has enhanced epithelial immunity to a natural microsporidian parasite. PLoS Pathog. 11, e1004583 (2015).
Article PubMed PubMed Central CAS Google Scholar
Ashe, A. et al. A deletion polymorphism in the Caenorhabditis elegans RIG-I homolog disables viral RNA dicing and antiviral immunity. eLife 2, e00994 (2013).
Article PubMed PubMed Central CAS Google Scholar
Martin, N., Singh, J. & Aballay, A. Natural genetic variation in the Caenorhabditis elegans response to Pseudomonas aeruginosa. G3 7, 1137–1147 (2017).
Article CAS PubMed PubMed Central Google Scholar
Thomas, C. G. et al. Full-genome evolutionary histories of selfing, splitting, and selection in Caenorhabditis. Genome Res. 25, 667–678 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kiontke, K. C. et al. A phylogeny and molecular barcodes for Caenorhabditis, with numerous new species from rotting fruits. BMC Evol. Biol. 11, 339 (2011).
Article CAS PubMed PubMed Central Google Scholar
Busch, J. W. & Delph, L. F. Evolution: selfing takes species down Stebbins’s blind alley. Curr. Biol. 27, R61–R63 (2017).
Article CAS PubMed Google Scholar
Ferrari, C. et al. Ephemeral-habitat colonization and neotropical species richness of Caenorhabditis nematodes. BMC Ecol. 17, 43 (2017).
Article PubMed PubMed Central Google Scholar
Greene, J. S., Dobosiewicz, M., Butcher, R. A., McGrath, P. T. & Bargmann, C. I.Regulatory changes in two chemoreceptor genes contribute to a Caenorhabditis elegans QTL for foraging behavior. eLife 5, e21454 (2016).
Article PubMed PubMed Central Google Scholar
Lee, D. et al. Selection and gene flow shape niche-associated variation in pheromone response. Nat. Ecol. Evol. 3, 1455–1463 (2019).
Article PubMed PubMed Central Google Scholar
Webster, A. K. et al. Population selection and sequencing of Caenorhabditis elegans wild isolates identifies a region on chromosome III affecting starvation resistance. G3 9, 3477–3488 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ghosh, R., Andersen, E. C., Shapiro, J. A., Gerke, J. P. & Kruglyak, L. Natural variation in a chloride channel subunit confers avermectin resistance in C. elegans. Science 335, 574–578 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ben-David, E., Burga, A. & Kruglyak, L. A maternal-effect selfish genetic element in Caenorhabditis elegans. Science 356, 1051–1055 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176.e13 (2020).
Article CAS PubMed Google Scholar
Cutter, A. D., Wasmuth, J. D. & Washington, N. L. Patterns of molecular evolution in Caenorhabditis preclude ancient origins of selfing. Genetics 178, 2093–2104 (2008).
Article CAS PubMed PubMed Central Google Scholar
Brandvain, Y., Slotte, T., Hazzouri, K. M., Wright, S. I. & Coop, G. Genomic identification of founding haplotypes reveals the history of the selfing species Capsella rubella. PLoS Genet. 9, e1003754 (2013).
Article CAS PubMed PubMed Central Google Scholar
Todesco, M. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020).
Article CAS PubMed Google Scholar
Burgarella, C. et al. Adaptive introgression: an untapped evolutionary mechanism for crop adaptation. Front. Plant Sci. 10, 4 (2019).
Article PubMed PubMed Central Google Scholar
Kanzaki, N. et al. Biology and genome of a newly discovered sibling species of Caenorhabditis elegans. Nat. Commun. 9, 3216 (2018).
Article PubMed PubMed Central CAS Google Scholar
Andersen, E. C., Bloom, J. S., Gerke, J. P. & Kruglyak, L. A variant in the neuropeptide receptor npr-1 is a major determinant of Caenorhabditis elegans growth and physiology. PLoS Genet. 10, e1004156 (2014).
Article PubMed PubMed Central CAS Google Scholar
Cook, D. E., Zdraljevic, S., Roberts, J. P. & Andersen, E. C. CeNDR, the Caenorhabditis elegans Natural Diversity Resource. Nucleic Acids Res. 45, D650–D657 (2017).
Article CAS PubMed Google Scholar
Cook, D. E. et al. The genetic basis of natural variation in Caenorhabditis elegans telomere length. Genetics 204, 371–383 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Lee, R. Y. N. et al. WormBase 2017: molting into a new stage. Nucleic Acids Res. 46, D869–D874 (2018).
Article CAS PubMed Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w¹¹¹⁸; iso-2; iso-3. Fly 6, 80–92 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ortiz, E. M. vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis. GitHub https://github.com/edgardomortiz/vcf2phylip (2019).
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).
Article CAS PubMed Google Scholar
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Article Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Article PubMed PubMed Central CAS Google Scholar
Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013).
Article CAS PubMed PubMed Central Google Scholar
Miles, A., Ralph, P., Rae, S. & Pisupati, R. cggh/scikit-allel: v1.2.1. Zenodo https://doi.org/10.5281/zenodo.3238280 (2019).
Siewert, K. M. & Voight, B. F.BetaScan2: standardized statistics to detect balancing selection utilizing substitution data. Genome Biol. Evol. 12, 3873–3877 (2020).
Article CAS PubMed PubMed Central Google Scholar
Siewert, K. BetaScan GitHub https://github.com/ksiewert/BetaScan (2017).
Zhang, C., Dong, S.-S., Xu, J.-Y., He, W.-M. & Yang, T.-L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786–1788 (2019).
Article CAS PubMed Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Article CAS PubMed Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
PubMed Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Article PubMed PubMed Central CAS Google Scholar
Laetsch, D. R. & Blaxter, M. L. BlobTools: interrogation of genome assemblies. F1000Res. 6, 1287 (2017).
Article Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Article CAS Google Scholar
Pundir, S., Martin, M. J. & O’Donovan, C. in Protein Bioinformatics: From Protein Modifications and Networks to Proteomics (eds Wu, C. H. et al.) 41–55 (Springer, 2017).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Article CAS PubMed Google Scholar
C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).
Delcher, A. L., Salzberg, S. L. & Phillippy, A. M. Using MUMmer to identify similar regions in large sequence sets. Curr. Protoc. Bioinform. 10, 10.3 (2003).
Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Holdorf, A. D. et al. WormCat: an online tool for annotation and visualization of Caenorhabditis elegans genome-scale data. Genetics 214, 279–294 (2019).
Article PubMed PubMed Central CAS Google Scholar
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Article CAS PubMed PubMed Central Google Scholar
Carlson, M. org.Ce.eg.db: Genome wide annotation for Worm. R package version 3.8.2 https://bioconductor.org/packages/release/data/annotation/html/org.Ce.eg.db.html (2019).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Article CAS PubMed Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Article PubMed PubMed Central CAS Google Scholar
Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinform. 6, 31 (2005).
Article CAS Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Article CAS PubMed Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
Bradley, R. K. et al. Fast statistical alignment. PLoS Comput. Biol. 5, e1000392 (2009).
Article PubMed PubMed Central CAS Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Article CAS PubMed PubMed Central Google Scholar
Stein, L. D. et al. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 1, E45 (2003).
Article PubMed PubMed Central CAS Google Scholar
Yin, D. et al. Rapid genome shrinkage in a self-fertile nematode reveals sperm competition proteins. Science 359, 55–61 (2018).
Article CAS PubMed PubMed Central Google Scholar
Stevens, L. et al. The genome of Caenorhabditis bovis. Curr. Biol. 30, 1023–1031.e4 (2020).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank members of the Andersen laboratory for providing comments on this manuscript. We especially thank M. Ailion, J. David, R. Luallen, N. Pujol and citizen scientists for contributing wild C. elegans strains to CeNDR. We also thank the Duke University School of Medicine for use of the Sequencing and Genomic Technologies Shared Resource, which provided Pacific Biosciences long-read sequencing. This work was funded by an NSF CAREER award (1751035) and a Human Frontier Science Program Award (RGP0001/2019) (to E.C.A.). This work was also funded by National Institutes of Health (NIH) grant ES029930 (to E.C.A., M.V.R. and L.R.B.). S.Z. received funding from The Cellular and Molecular Basis of Disease training programme (T32GM008061) and the Rappaport Award for Research Excellence through the IBiS graduate programme. A.K.W. is supported by the National Science Foundation Graduate Research Fellowship. Long-read sequencing of three isolates was funded by the NIH (R01 GM117408 to L.R.B.) and a T32 training grant for the University Program in Genetics and Genomics (GM007754). M.V.R. is supported by NIH grant GM121828. M.G.S. was supported by an NWO Domain Applied and Engineering Sciences Veni grant (17282).

Author information

Daehan Lee
Present address: Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
Stefan Zdraljevic
Present address: Department of Human Genetics, University of California, Los Angeles, CA, USA
Stefan Zdraljevic
Present address: Howard Hughes Medical Institute, University of California, Los Angeles, CA, USA
Ye Wang
Present address: Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, Chengdu Research Base of Giant Panda Breeding, Chengdu, People’s Republic of China
These authors contributed equally: Daehan Lee, Stefan Zdraljevic, Lewis Stevens.

Authors and Affiliations

Department of Molecular Biosciences, Northwestern University, Evanston, IL, USA
Daehan Lee, Stefan Zdraljevic, Lewis Stevens, Ye Wang, Robyn E. Tanny, Timothy A. Crombie, Daniel E. Cook & Erik C. Andersen
Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
Stefan Zdraljevic
Department of Biology, Duke University, Durham, NC, USA
Amy K. Webster, Rojin Chirakar & L. Ryan Baugh
University Program in Genetics and Genomics, Duke University, Durham, NC, USA
Amy K. Webster
Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
L. Ryan Baugh
Laboratory of Nematology, Wageningen University and Research, Wageningen, the Netherlands
Mark G. Sterken
Université Côte d’Azur, CNRS, Inserm, IBV, France, Nice, France
Christian Braendle
Institut de Biologie de l’Ecole Normale Supérieure, Centre National de la Recherche Scientifique, INSERM, École Normale Supérieure, Paris Sciences et Lettres, Paris, France
Marie-Anne Félix
Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
Matthew V. Rockman

Authors

Daehan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Zdraljevic
View author publications
You can also search for this author in PubMed Google Scholar
Lewis Stevens
View author publications
You can also search for this author in PubMed Google Scholar
Ye Wang
View author publications
You can also search for this author in PubMed Google Scholar
Robyn E. Tanny
View author publications
You can also search for this author in PubMed Google Scholar
Timothy A. Crombie
View author publications
You can also search for this author in PubMed Google Scholar
Daniel E. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Amy K. Webster
View author publications
You can also search for this author in PubMed Google Scholar
Rojin Chirakar
View author publications
You can also search for this author in PubMed Google Scholar
L. Ryan Baugh
View author publications
You can also search for this author in PubMed Google Scholar
Mark G. Sterken
View author publications
You can also search for this author in PubMed Google Scholar
Christian Braendle
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Anne Félix
View author publications
You can also search for this author in PubMed Google Scholar
Matthew V. Rockman
View author publications
You can also search for this author in PubMed Google Scholar
Erik C. Andersen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.L., S.Z. and E.C.A. conceived of and designed the study. D.L., S.Z., L.S. and E.C.A. analysed the data and wrote the manuscript. Y.W., R.E.T. and D.E.C. performed whole-genome sequencing and isotype characterization for 609 wild C. elegans strains. R.E.T. performed long-read sequencing for 11 C. elegans wild isolates. R.C., A.K.W. and L.R.B. performed long-read sequencing for three C. elegans wild isolates. M.G.S., C.B., M.V.R. and M.-A.F. contributed wild isolates to the C. elegans strain collection. M.G.S., C.B., M.V.R., M.-A.F. and T.A.C. edited the manuscript.

Corresponding author

Correspondence to Erik C. Andersen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Ecology & Evolution thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Chromosome-scale selective sweeps across wild C. elegans isotypes.

a, The genome-wide distribution of the most frequent haplotype (red) among 324 wild isotypes with known geographic origin is shown. Grey genomic regions represent other haplotypes, and white represents unclassified haplotypes. Each row is one of the 324 isotypes, grouped by the geographic origin. The genomic position in Mb is plotted on the x-axis, and each tick mark represents 5 Mb of the chromosome. b, Beeswarm plots of the proportion of the most frequent haplotype for each chromosome from (a) for 324 isotypes with known geographic origins are shown. Wild isotypes are grouped by geographic origin. Each point corresponds to one of the 324 isotypes, and geographic origins are shown on the y-axis.

Extended Data Fig. 2 Patterns of molecular diversity across the C. elegans genome.

The chromosomal patterns a, Watterson’s theta (θ) and b, nucleotide diversity (pi) for non-overlapping 1 kb windows are shown. Each dot corresponds to the calculated value for a particular window. The genomic position in Mb is plotted on the x-axis. Diversity statistic values are shown on the y-axis. Smoothed lines (blue) are LOESS fits. c, Tukey box plots of genetic diversity statistics from (a) are shown with outlier data points plotted. Genetic diversity statistics for each sliding window are grouped by the chromosomal region defined previously³⁶. Genetic diversity statistic values are shown on the y-axis. The horizontal line in the middle of the box is the median, and the box denotes the 25th to 75th quantiles of the data. The vertical line represents the 1.5x interquartile range.

Extended Data Fig. 3 Optimization of parameters for the characterization of hyper-divergent regions.

a,b, The total detected hyper-divergent regions in Mb (x-axis) and the percent overlap of long-read and short-read hyper-divergent classification (y-axis) are shown (Methods). Each point corresponds to one of the combination of threshold parameters for the variant count and coverage fraction of 1 kb bin to be classified as hyper-divergent. Each point is coloured by the variant count threshold (a) or the coverage fraction threshold (b). c, The relationship between the total size of hyper-divergent regions detected by the optimized short-read or long-read based approach is shown. Each point corresponds to one of the 15 long-read sequenced isotypes. Total sizes of hyper-divergent regions detected by the short-read based approach are shown on the x-axis, and total sizes of hyper-divergent regions detected by the long-read based approach are shown on the y-axis. d, The overlap between hyper-divergent regions defined by the optimized short-read based approach and long-read based approach is shown. Each point corresponds to one of the 15 long-read sequenced isotypes. Total sizes of hyper-divergent regions detected by either short-read or long-read based approach are shown on the x-axis, and the percentages of hyper-divergent regions detected by both approaches are shown on the y-axis.

Extended Data Fig. 4 Summary statistics for hyper-divergent regions across six chromosomes.

a, Bar plots for the comparisons of variant (SNV/indel) density (top) and coverage fraction (bottom) between hyper-divergent regions (red) and the rest of the regions (blue) in each chromosomal region are shown. Note that no hyper-divergent region was found on the tips of chromosome I. b, Fold differences between hyper-divergent regions and the rest of the regions from (a) are shown.

Extended Data Fig. 5 Genomic signatures of balancing selection in non-divergent regions and hyper-divergent regions.

Tukey box plots of Tajima’s D (a) and standardized beta (b) are shown. Genomic bins (1 kb) (a) or variants (b) are grouped and coloured by their classification: (1) non-divergent bins (yellow), (2) hyper-divergent bins with high variant density (≥ 16 SNVs/indels, red), (3) hyper-divergent bins with low read depth (< 35%, blue). Hyper-divergent bins are grouped by their species-wide frequencies: rare (<1%), intermediate (≥ 1% and < 5%), or common (≥ 5%). The horizontal line in the middle of the box is the median, and the box denotes the 25th to 75th quantiles of the data. The vertical line represents the 1.5x interquartile range.

Extended Data Fig. 6 Gene ontology (GO) enrichment for hyper-divergent regions.

Gene ontology (GO) enrichment for the biological process category (a) and the molecular function category (b) for non-divergent chromosomal arms (square) and hyper-divergent regions (circle) are shown. Significantly enriched GO terms in control regions or hyper-divergent regions or both are shown on the y-axis. Bonferroni-corrected significance values for GO enrichment are shown on the x-axis. Sizes of squares and circles correspond to the fold enrichment of the annotation, and colours of square and circle correspond to the gene counts of the annotation. The blue line shows the Bonferroni-corrected significance threshold (corrected p-value = 0.05). Note, we did not detect any GO-term enrichment of genes in non-divergent chromosomal arms for the biological process category.

Extended Data Fig. 7 Species-wide SNP-based relatedness of divergent regions is in agreement with long-read sequencing results.

The inferred for the C. elegans species-wide relatedness for the hyper-divergent regions that span (a) II:3,667,179-3,701,405, (b) I:2,318,291-2,381,851, and (c) V:20,193,463-20,267,244 are shown. The x-axis represents the dissimilarity of the fraction of identity-by-state in the region. For a-c, the isotype names are coloured to match the haplotypes defined by long-read sequence data in Fig. 5 and Extended Data Figs. 8, 9, respectively. The branch colours correspond to the species-wide genetic groups identified by PCA in Fig. 1c.

Extended Data Fig. 8 Two hyper-divergent haplotypes at the peel-1 zeel-1 incompatibility locus.

a, The protein-coding gene contents of the two hyper-divergent haplotypes at the peel-1 zeel-1 incompatibility locus on the left arm of chromosome I (I:2,318,291-2,381,851 of the N2 reference genome). The tree was inferred using SNVs and coloured by inferred haplotypes. For each distinct haplotype, we chose a single isotype as a haplotype representative (orange haplotype: N2, blue haplotype: CB4856) and predicted protein-coding genes using both protein-based alignments and ab initio approaches. Protein-coding genes are shown as boxes; those genes that are conserved in all haplotypes are coloured based on their haplotype, and those genes that are not are coloured light grey. Dark grey boxes behind genes indicate coordinates of divergent regions. Genes with locus names in N2 are highlighted. b, Heatmaps showing amino acid identity for alleles of four genes (mcm-4, srbc-64, ugt-31, and sydn-1). The percentage identity was calculated using alignments of protein sequences from all 16 isotypes. Heatmaps are ordered by the SNV tree shown in (a). c, Maximum-likelihood gene trees of four genes (mcm-4, srbc-64, ugt-31, and sydn-1) inferred using amino acid alignments. Trees are plotted on the same scale (scale shown; scale is in substitutions per site). Strain names are coloured by their haplotype.

Extended Data Fig. 9 Hyper-divergent haplotypes at a region on the right arm of chromosome V.

a, The protein-coding gene contents of the seven hyper-divergent haplotypes at a region on the right arm of chromosome V (V:20,193,463-20,267,244 of the N2 reference genome). The tree was inferred using SNVs and coloured by inferred haplotypes. For each distinct haplotype, we chose a single isotype as a haplotype representative (orange haplotype: N2, light blue haplotype: JU2526, red haplotype: EG4725, pink haplotype: ECA36, green haplotype: DL238, dark blue haplotype: QX1794, purple haplotype: NIC526) and predicted protein-coding genes using both protein-based alignments and ab initio approaches. JU2526 shares the reference haplotype at fbxa-113 and fbxb-59 (six hyper-divergent haplotypes at these loci) but is divergent at Y113G7B.15 (seven hyper-divergent haplotypes at this locus). Protein-coding genes are shown as boxes; those genes that are conserved in all haplotypes are coloured based on their haplotypes, and those genes that are not are coloured light grey. Dark grey boxes behind genes indicate coordinates of divergent regions. Genes with locus names in N2 are highlighted. Of the 25 genes that are not conserved in all haplotypes (light grey boxes), ten are alleles of the three reference haplotype (N2) loci coloured in light grey. The remaining 15 do not have a clear one-to-one relationship with a gene in the reference haplotype. Seven of these 15 have homology to F54E12.2 (present in the reference haplotype) and are likely the product of duplication and diversification. Six have homology to either M04C3.1, F19B2.5, or F54E12.2, all of which are genes with SNF2 family N-terminal domains and which exist elsewhere in the N2 reference genome. Of the remaining two genes, one has homology to Y113G7B.15, which is present in the reference haplotype, and the other has homology to W09C3.8, a gene on chromosome I in the reference genome. Functional annotations of all unconserved loci (including BLAST hits and Pfam domains identified by InterProScan) can be found in Supplementary Data 4. b, Heatmaps show amino acid identity for between alleles of five genes (srh-217, fbxb-113, fbxb-59, Y113G7B.15, and mdt-17). The percentage identity was calculated using alignments of proteins sequences from all 16 isotypes. Heatmaps are ordered by the SNV tree shown in (a). c, Maximum-likelihood gene trees of five genes (srh-217, fbxb-113, fbxb-59, Y113G7B.15, and mdt-17) inferred using amino acid alignments. Trees are plotted on the same scale (scale shown; scale is in substitutions per site). Strain names are coloured by their haplotype.

Extended Data Fig. 10 Hyper-divergent regions in C. briggsae.

The genome-wide distribution of hyper-divergent regions across 35 non-reference wild C. briggsae strains is shown. In the top panel, each row is one of the 35 strains, grouped by previously defined clades (tropical or others) ordered by the total amount of genome covered by hyper-divergent regions (black). In the bottom panel, brown bars indicate genomic positions in which more than 10% of strains are classified as hyper-divergent at the locus. The genomic position in Mb is plotted on the x-axis, and each tick represents 5 Mb of the chromosome.

Supplementary information

Supplementary Information

Supplementary Figs. 1–8 and Tables 1–6.

Reporting Summary

Peer Review Information

Supplementary Tables

Supplementary Tables 1–6.

Supplementary Data

Supplementary Data 1–4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, D., Zdraljevic, S., Stevens, L. et al. Balancing selection maintains hyper-divergent haplotypes in Caenorhabditis elegans. Nat Ecol Evol 5, 794–807 (2021). https://doi.org/10.1038/s41559-021-01435-x

Download citation

Received: 18 September 2020
Accepted: 26 February 2021
Published: 05 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1038/s41559-021-01435-x

This article is cited by

The genome and transcriptome of the snail Biomphalaria sudanica s.l.: immune gene diversification and highly polymorphic genomic regions in an important African vector of Schistosoma mansoni
- Tom Pennance
- Javier Calvelo
- Michelle L. Steinauer
BMC Genomics (2024)
Novel and improved Caenorhabditis briggsae gene models generated by community curation
- Nicolas D. Moya
- Lewis Stevens
- Erik C. Andersen
BMC Genomics (2023)
The parasitic nematode Strongyloides ratti exists predominantly as populations of long-lived asexual lineages
- Rebecca Cole
- Nancy Holroyd
- Mark Viney
Nature Communications (2023)
Higher-order epistasis shapes natural variation in germ stem cell niche activity
- Sarah R. Fausett
- Asma Sandjak
- Christian Braendle
Nature Communications (2023)
Ancient diversity in host-parasite interaction genes in a model parasitic nematode
- Lewis Stevens
- Isaac Martínez-Ugalde
- Mark Blaxter
Nature Communications (2023)