当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Species Tree Estimation and the Impact of Gene Loss Following Whole-Genome Duplication.
Systematic Biology ( IF 6.1 ) Pub Date : 2022-10-12 , DOI: 10.1093/sysbio/syac040
Haifeng Xiong 1 , Danying Wang 1 , Chen Shao 1 , Xuchen Yang 1 , Jialin Yang 2 , Tao Ma 1 , Charles C Davis 3 , Liang Liu 2 , Zhenxiang Xi 1
Affiliation  

Whole-genome duplication (WGD) occurs broadly and repeatedly across the history of eukaryotes and is recognized as a prominent evolutionary force, especially in plants. Immediately following WGD, most genes are present in two copies as paralogs. Due to this redundancy, one copy of a paralog pair commonly undergoes pseudogenization and is eventually lost. When speciation occurs shortly after WGD; however, differential loss of paralogs may lead to spurious phylogenetic inference resulting from the inclusion of pseudoorthologs-paralogous genes mistakenly identified as orthologs because they are present in single copies within each sampled species. The influence and impact of including pseudoorthologs versus true orthologs as a result of gene extinction (or incomplete laboratory sampling) are only recently gaining empirical attention in the phylogenomics community. Moreover, few studies have yet to investigate this phenomenon in an explicit coalescent framework. Here, using mathematical models, numerous simulated data sets, and two newly assembled empirical data sets, we assess the effect of pseudoorthologs on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and differential gene loss scenarios following WGD. When gene loss occurs along the terminal branches of the species tree, alignment-based (BPP) and gene-tree-based (ASTRAL, MP-EST, and STAR) coalescent methods are adversely affected as the degree of ILS increases. This can be greatly improved by sampling a sufficiently large number of genes. Under the same circumstances, however, concatenation methods consistently estimate incorrect species trees as the number of genes increases. Additionally, pseudoorthologs can greatly mislead species tree inference when gene loss occurs along the internal branches of the species tree. Here, both coalescent and concatenation methods yield inconsistent results. These results underscore the importance of understanding the influence of pseudoorthologs in the phylogenomics era. [Coalescent method; concatenation method; incomplete lineage sorting; pseudoorthologs; single-copy gene; whole-genome duplication.].

中文翻译:

物种树估计和全基因组复制后基因丢失的影响。

全基因组复制 (WGD) 在真核生物的历史中广泛而反复地发生,被认为是一种突出的进化力量,尤其是在植物中。在 WGD 之后,大多数基因作为旁系同源物存在于两个副本中。由于这种冗余,paralog 对的一个副本通常会进行伪生成并最终丢失。当 WGD 后不久发生物种形成时;然而,旁系同源物的差异丢失可能会导致虚假的系统发育推断,原因是包含伪直系同源物 - 旁系同源基因被错误地识别为直系同源物,因为它们在每个采样物种中以单拷贝形式存在。由于基因灭绝(或不完整的实验室采样),包括伪直系同源物与真正的直系同源物的影响和影响直到最近才在系统基因组学界获得经验关注。此外,很少有研究在明确的聚结框架中研究这种现象。在这里,我们使用数学模型、大量模拟数据集和两个新组装的经验数据集,评估伪直系同源物在不同程度的不完全谱系分类 (ILS) 和 WGD 后差异基因丢失情景下对物种树估计的影响。当沿着物种树的末端分支发生基因丢失时,基于比对 (BPP) 和基于基因树 (ASTRAL、MP-EST 和 STAR) 的合并方法会随着 ILS 程度的增加而受到不利影响。这可以通过采样足够多的基因来大大改善。然而,在相同的情况下,随着基因数量的增加,连接方法始终会估计不正确的物种树。此外,当沿着物种树的内部分支发生基因丢失时,伪直系同源物会极大地误导物种树的推断。在这里,合并和连接方法都会产生不一致的结果。这些结果强调了了解伪直系同源物在系统基因组学时代影响的重要性。[聚结法;连接方法;不完整的血统排序;伪直系同源物;单拷贝基因;全基因组复制。]。随着基因数量的增加,串联方法始终估计不正确的物种树。此外,当沿着物种树的内部分支发生基因丢失时,伪直系同源物会极大地误导物种树的推断。在这里,合并和连接方法都会产生不一致的结果。这些结果强调了了解伪直系同源物在系统基因组学时代影响的重要性。[聚结法;连接方法;不完整的血统排序;伪直系同源物;单拷贝基因;全基因组复制。]。随着基因数量的增加,串联方法始终估计不正确的物种树。此外,当沿着物种树的内部分支发生基因丢失时,伪直系同源物会极大地误导物种树的推断。在这里,合并和连接方法都会产生不一致的结果。这些结果强调了了解伪直系同源物在系统基因组学时代影响的重要性。[聚结法;连接方法;不完整的血统排序;伪直系同源物;单拷贝基因;全基因组复制。]。这些结果强调了了解伪直系同源物在系统基因组学时代影响的重要性。[聚结法;连接方法;不完整的血统排序;伪直系同源物;单拷贝基因;全基因组复制。]。这些结果强调了了解伪直系同源物在系统基因组学时代影响的重要性。[聚结法;连接方法;不完整的血统排序;伪直系同源物;单拷贝基因;全基因组复制。]。
更新日期:2022-06-11
down
wechat
bug