当前位置: X-MOL 学术Heredity › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Repetitive genomic regions and the inference of demographic history
Heredity ( IF 3.8 ) Pub Date : 2021-05-17 , DOI: 10.1038/s41437-021-00443-8
Ajinkya Bharatraj Patil 1 , Nagarjun Vijay 1
Affiliation  

Inference of demographic histories using whole-genome datasets has provided insights into diversification, adaptation, hybridization, and plant–pathogen interactions, and stimulated debate on the impact of anthropogenic interventions and past climate on species demography. However, the impact of repetitive genomic regions on these inferences has mostly been ignored by masking of repeats. We use the Populus trichocarpa genome (Pop_tri_v3) to show that masking of repeat regions leads to lower estimates of effective population size (Ne) in the distant past in contrast to an increase in Ne estimates in recent times. However, in human datasets, masking of repeats resulted in lower estimates of Ne at all time points. We demonstrate that repeats affect demographic inferences using diverse methods like PSMC, MSMC, SMC++, and the Stairway plot. Our genomic analysis revealed that the biases in Ne estimates were dependent on the repeat class type and its abundance in each atomic interval. Notably, we observed a weak, yet consistently significant negative correlation between the repeat abundance of an atomic interval and the Ne estimates for that interval, which potentially reflects the recombination rate variation within the genome. The rationale for the masking of repeats has been that variants identified within these regions are erroneous. We find that polymorphisms in some repeat classes occur in callable regions and reflect reliable coalescence histories (e.g., LTR Gypsy, LTR Copia). The current demography inference methods do not handle repeats explicitly, and hence the effect of individual repeat classes needs careful consideration in comparative analysis. Deciphering the repeat demographic histories might provide a clear understanding of the processes involved in repeat accumulation.



中文翻译:

重复的基因组区域和人口统计历史的推断

使用全基因组数据集推断人口历史为多样化、适应、杂交和植物与病原体相互作用提供了见解,并引发了关于人为干预和过去气候对物种人口学影响的争论。然而,重复基因组区域对这些推论的影响大多被重复掩蔽所忽略。我们使用毛果杨基因组 (Pop_tri_v3) 来表明,重复区域的掩蔽会导致远古时期有效种群规模 ( Ne )的估计值降低,而最近的Ne估计值则有所增加。然而,在人类数据集中,重复掩蔽导致所有时间点的Ne估计值较低。我们使用 PSMC、MSMC、SMC++ 和阶梯图等多种方法证明重复会影响人口统计推断。我们的基因组分析表明, Ne估计值偏差取决于重复类类型及其在每个原子间隔中的丰度。值得注意的是,我们观察到原子间隔的重复丰度与该间隔的 Ne 估计之间存在微弱但始终显着的负相关性可能反映了基因组内的重组率变化。掩盖重复序列的基本原理是在这些区域内鉴定的变异是错误的。我们发现一些重复类中的多态性发生在可调用区域中并反映了可靠的合并历史(例如,LTR Gypsy、LTR Copia)。目前的人口统计推断方法没有明确地处理重复,因此在比较分析中需要仔细考虑各个重复类别的影响。破译重复人口统计历史可能有助于清楚地了解重复积累所涉及的过程。

更新日期:2021-05-17
down
wechat
bug