当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A simulation study to examine the information content in phylogenomic datasets under the multispecies coalescent model.
Molecular Biology and Evolution ( IF 10.7 ) Pub Date : 2020-07-08 , DOI: 10.1093/molbev/msaa166
Jun Huang 1, 2 , Tomáš Flouri 1 , Ziheng Yang 1
Affiliation  

Abstract
We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.


中文翻译:

在多物种合并模型下检查系统发育数据集中信息内容的模拟研究。

摘要
我们使用计算机模拟来检查多场所数据集中的信息内容,以便在多物种合并模型下进行推理。考虑的推理问题包括进化参数的估计(例如物种发散时间,种群大小和跨物种渗入概率),物种树估计和基于定界模型的贝叶斯比较的物种定界。我们发现,基因座数量是几乎所有检查的推理问题中影响最大的因素。尽管每个物种的序列数对于物种树估计似乎并不重要,但它对物种定界非常重要。增加位点的数量和每个位点的突变率都会增加整个基因座的突变率,这对参数的估计具有相同的影响,但是序列长度比物种树估计中每个位点的突变率具有更大的影响。当数据量增加时,我们讨论了计算成本,并提供了有关基因组数据二次采样的准则,以使应用全似然推理方法成为可能。
更新日期:2020-11-21
down
wechat
bug