当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses?
Systematic Biology ( IF 6.5 ) Pub Date : 2020-08-14 , DOI: 10.1093/sysbio/syaa064
Daniel M Portik 1, 2 , John J Wiens 1
Affiliation  

Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from dozens, hundreds, or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e. removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these datasets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data (∼5,000 loci) and subsampled datasets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic datasets (e.g. length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several "best practices" for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses.

中文翻译:

对齐和修剪方法对系统基因组 (UCE) 分析很重要吗?

比对是分子系统发育学中的一个关键问题,因为不同的比对方法可能会为单个基因产生非常不同的拓扑结构。但尚不清楚比对方法的选择在系统发育分析中是否仍然重要,系统发育分析包含来自数十、数百或数千个基因的数据。例如,有问题的比对偏差可能会在许多基因座上成倍增加,而单个基因中的比对错误可能变得无关紧要。对齐修整的问题(即从单个基因中去除对齐不良的区域或缺失数据)也没有得到很好的探讨。在这里,我们测试了 12 种不同的对齐和修剪方法组合对系统发育分析的影响。我们使用来自有鳞爬行动物(蜥蜴和蛇)、鸟类和四足动物的超保守元素 (UCE) 的已发表系统发育数据来比较这些方法。我们比较了由不同对齐和修剪方法(例如,长度、信息丰富的站点、缺失数据)生成的对齐的属性。我们还使用完整数据(~5,000 个基因座)和子采样数据集(10% 和 1% )。我们表明不同的对齐和修剪方法可以显着影响系统基因组数据集的各个方面(例如长度、信息站点)。然而,这些不同的方法通常对成熟进化枝的恢复和支持值几乎没有影响,即使是在非常不同数量的基因座上。尽管如此,我们的结果提出了几种对齐和修剪的“最佳实践”。有趣的是,系统发育方法的选择对结果的影响最大,与物种树分析相比,串联分析恢复了更完善的进化枝(具有更强的支持)。
更新日期:2020-08-14
down
wechat
bug