当前位置: X-MOL 学术bioRxiv. Evol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating the role of reference-genome phylogenetic distance on evolutionary inference
bioRxiv - Evolutionary Biology Pub Date : 2021-03-04 , DOI: 10.1101/2021.03.03.433733
Aparna Prasad , Eline D Lorenzen , Michael V Westbury

When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal and a bird species (beluga and rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (PSMC) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic analyses, but can with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. As for genetic diversity estimates, we find that increased phylogenetic distance has a pronounced impact; heterozygosity estimates deviate incrementally as phylogenetic distance increases. Moreover, runs of homozygosity are removed when mapping to any non-conspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting the reference genome for mapping assemblies. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.

中文翻译:

评估参考基因组系统发生距离在进化推断中的作用

当无法获得目标物种的高质量基因组装配时,避免进行昂贵的从头装配过程的一种选择是基于映射的装配。但是,将shot弹枪数据映射到远处的亲戚可能会导致有偏性或错误的进化推理。在这里,我们使用了来自哺乳动物和鸟类(白鲸和罗威猕猴桃)的短时读取数据来评估参考基因组系统发育距离是否会影响下游人口统计(PSMC)和遗传多样性(杂合性,纯合性运行)分析。我们映射到了具有不同系统发育距离的物种的装配体(对应于> 7%的全基因组差异),并使用跨物种支架创建了从头装配体。我们表明,尽管参考基因组系统发育距离对人口统计分析有影响,直到使用与目标物种差异> 3%的参考基因组时,它才会显着。当映射到跨物种的脚手架装配时,我们无法复制原始的白鲸人口统计分析,但是可以使用罗威奇异鸟,推测是反映了白鲸组件的更零碎的本质。至于遗传多样性的估计,我们发现系统发育距离的增加具有显着影响;随着系统发育距离的增加,杂合度估计值会逐渐偏离。此外,当映射到任何非非特异性装配时,纯合的运行将被删除。但是,当映射到跨物种的脚手架组件时,可以减少这些偏差。两者合计,我们的结果表明,选择用于定位图谱的参考基因组时应谨慎行事。
更新日期:2021-03-05
down
wechat
bug