当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Haploid, diploid, and pooled exome capture recapitulate features of biology and paralogy in two non-model tree species
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-07-16 , DOI: 10.1111/1755-0998.13474
Brandon M Lind 1 , Mengmeng Lu 2 , Dragana Obreht Vidakovic 1 , Pooja Singh 2 , Tom R Booker 1, 3 , Sally N Aitken 1 , Sam Yeaman 2
Affiliation  

Despite their suitability for studying evolution, many conifer species have large and repetitive giga-genomes (16–31 Gbp) that create hurdles to producing high coverage SNP data sets that capture diversity from across the entirety of the genome. Due in part to multiple ancient whole genome duplication events, gene family expansion and subsequent evolution within Pinaceae, false diversity from the misalignment of paralog copies creates further challenges in accurately and reproducibly inferring evolutionary history from sequence data. Here, we leverage the cost-saving benefits of pool-seq and exome-capture to discover SNPs in two conifer species, Douglas-fir (Pseudotsuga menziesii var. menziesii (Mirb.) Franco, Pinaceae) and jack pine (Pinus banksiana Lamb., Pinaceae). We show, using minimal baseline filtering, that allele frequencies estimated from pooled individuals show a strong, positive correlation with those estimated by sequencing the same population as individuals (r > .948), on par with such comparisons made in model organisms. Further, we highlight the utility of haploid megagametophyte tissue for identifying sites that are probably due to misaligned paralogs. Together with additional minor filtering, we show that it is possible to remove many of the loci with large frequency estimate discrepancies between individual and pooled sequencing approaches, improving the correlation further (r > .973). Our work addresses bioinformatic challenges in non-model organisms with large and complex genomes, highlights the use of megagametophyte tissue for the identification of paralogous artefacts, and suggests the combination of pool-seq and exome capture to be robust for further evolutionary hypothesis testing in these systems.

中文翻译:

单倍体、二倍体和合并的外显子组捕获了两种非模式树种的生物学和旁系学特征

尽管它们适合研究进化,但许多针叶树物种具有大型且重复的千兆基因组 (16-31 Gbp),这为生成高覆盖率 SNP 数据集以捕获整个基因组的多样性造成了障碍。部分由于多个古代全基因组重复事件、基因家族扩展和松科内的后续进化,来自旁系同源拷贝错位的错误多样性给从序列数据中准确和可重复地推断进化历史带来了进一步的挑战。在这里,我们利用 pool-seq 和外显子组捕获的成本节约优势来发现两种针叶树种的 SNP,花旗松 ( Pseudotsuga menziesii var. menziesii (Mirb.) Franco, Pinaceae) 和杰克松 ( Pinus bankiana Lamb., Pinaceae )。我们使用最小基线过滤表明,从合并个体估计的等位基因频率与通过对与个体相同的群体进行测序而估计的等位基因频率显示出强烈的正相关性 ( r  > .948),与在模型生物中进行的此类比较相当。此外,我们强调了单倍体巨配子体组织用于识别可能是由于未对齐的旁系同源物的位点的效用。与额外的次要过滤一起,我们表明可以去除许多在个体和合并测序方法之间具有较大频率估计差异的基因座,进一步提高相关性(r > .973)。我们的工作解决了具有大而复杂基因组的非模式生物中的生物信息学挑战,强调了使用巨型配子体组织来识别旁系同源人工制品,并建议将 pool-seq 和外显子组捕获相结合,以便在这些领域进行进一步的进化假设检验系统。
更新日期:2021-07-16
down
wechat
bug