当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distinguishing among complex evolutionary models using unphased whole-genome data through random forest approximate Bayesian computation
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2020-09-30 , DOI: 10.1111/1755-0998.13263
Silvia Ghirotto 1 , Maria Teresa Vizzari 1 , Francesca Tassi 2 , Guido Barbujani 2 , Andrea Benazzo 2
Affiliation  

Inferring past demographic histories is crucial in population genetics, and the amount of complete genomes now available should in principle facilitate this inference. In practice, however, the available inferential methods suffer from severe limitations. Although hundreds complete genomes can be simultaneously analysed, complex demographic processes can easily exceed computational constraints, and the procedures to evaluate the reliability of the estimates contribute to increase the computational effort. Here we present an approximate Bayesian computation framework based on the random forest algorithm (ABC-RF), to infer complex past population processes using complete genomes. To this aim, we propose to summarize the data by the full genomic distribution of the four mutually exclusive categories of segregating sites (FDSS), a statistic fast to compute from unphased genome data and that does not require the ancestral state of alleles to be known. We constructed an efficient ABC pipeline and tested how accurately it allows one to recognize the true model among models of increasing complexity, using simulated data and taking into account different sampling strategies in terms of number of individuals analysed, number and size of the genetic loci considered. We also compared the FDSS with the unfolded and folded site frequency spectrum (SFS), and for these statistics we highlighted the experimental conditions maximizing the inferential power of the ABC-RF procedure. We finally analysed real data sets, testing models on the dispersal of anatomically modern humans out of Africa and exploring the evolutionary relationships of the three species of Orangutan inhabiting Borneo and Sumatra.

中文翻译:

通过随机森林近似贝叶斯计算使用非相位全基因组数据区分复杂进化模型

推断过去的人口历史在种群遗传学中​​至关重要,现在可用的完整基因组的数量原则上应该有助于这一推断。然而,在实践中,可用的推理方法受到严重限制。虽然可以同时分析数百个完整的基因组,但复杂的人口统计过程很容易超出计算限制,评估估计可靠性的程序有助于增加计算工作量。在这里,我们提出了一个基于随机森林算法 (ABC-RF) 的近似贝叶斯计算框架,以使用完整的基因组推断复杂的过去种群过程。为此,我们建议通过四个相互排斥的分离位点类别 ( FDSS)的全基因组分布来总结数据),这是一种从非定相基因组数据快速计算的统计数据,并且不需要知道等位基因的祖先状态。我们构建了一个高效的 ABC 管道,并测试了它允许人们在越来越复杂的模型中识别真实模型的准确度,使用模拟数据并在分析的个体数量、所考虑的遗传基因座的数量和大小方面考虑不同的采样策略. 我们还将FDSS与展开和折叠的站点频谱(SFS),对于这些统计数据,我们强调了最大化 ABC-RF 程序推理能力的实验条件。我们最终分析了真实数据集,测试了解剖学上现代人类从非洲扩散的模型,并探索了居住在婆罗洲和苏门答腊的三种猩猩的进化关系。
更新日期:2020-09-30
down
wechat
bug