当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Conflict over the eukaryote root resides in strong outliers, mosaics and missing data sensitivity of site-specific (CAT) mixture models
Systematic Biology ( IF 6.5 ) Pub Date : 2022-04-12 , DOI: 10.1093/sysbio/syac029
Caesar Al Jewari 1 , Sandra L Baldauf 1
Affiliation  

Phylogenetic reconstruction using concatenated loci (“phylogenomics” or “supermatrix phylogeny”) is a powerful tool for solving evolutionary splits that are poorly resolved in single gene/protein trees (SGTs). However, recent phylogenomic attempts to resolve the eukaryote root have yielded conflicting results, along with claims of various artefacts hidden in the data. We have investigated these conflicts using two new methods for assessing phylogenetic conflict. ConJak uses whole marker (gene or protein) jackknifing to assess deviation from a central mean for each individual sequence, while ConWin uses a sliding window to screen for incongruent protein fragments (mosaics). Both methods allow selective masking of individual sequences or sequence fragments in order to minimize missing data, an important consideration for resolving deep splits with limited data. Analyses focused on a set of 76 eukaryotic proteins of bacterial-ancestry previously used in various combinations to assess the branching order among the three major divisions of eukaryotes: Amorphea (mainly animals, fungi and Amoebozoa), Diaphoretickes (most other well-known eukaryotes and nearly all algae) and Excavata, represented here by Discoba (Jakobida, Heterolobosea, and Euglenozoa). ConJak analyses found strong outliers to be concentrated in under-sampled lineages, while ConWin analyses of Discoba, the most under-sampled of the major lineages, detected potentially incongruent fragments scattered throughout. Phylogenetic analyses of the full data using an LG-gamma model support a Discoba sister scenario (neozoan-excavate root), which rises to 99-100% bootstrap support with data masked according to either protocol. However, analyses with two site-specific (CAT) mixture models yielded widely inconsistent results and a striking sensitivity to missing data. The neozoan-excavate root places Amorphea and Diaphoretickes as more closely related to each other than either is to Discoba, a fundamental relationship that should remain unaffected by additional taxa.

中文翻译:

关于真核生物根的冲突存在于特定站点 (CAT) 混合模型的强异常值、马赛克和缺失数据敏感性

使用连锁基因座(“系统基因组学”或“超矩阵系统发育学”)的系统发育重建是解决单基因/蛋白质树(SGT)中难以解决的进化分裂的强大工具。然而,最近解决真核生物根的系统发育学尝试产生了相互矛盾的结果,以及数据中隐藏的各种人工制品的主张。我们使用两种评估系统发育冲突的新方法调查了这些冲突。ConJak 使用整个标记(基因或蛋白质)折叠来评估每个单独序列与中心平均值的偏差,而 ConWin 使用滑动窗口来筛选不一致的蛋白质片段(马赛克)。这两种方法都允许选择性屏蔽单个序列或序列片段,以最大限度地减少丢失的数据,解决数据有限的深度分裂的重要考虑因素。分析的重点是一组 76 种细菌祖先的真核蛋白质,这些蛋白质以前以各种组合的形式用于评估真核生物的三个主要部分之间的分支顺序:Amorphea(主要是动物、真菌和变形虫)、Diaphoretickes(大多数其他著名的真核生物和几乎所有的藻类)和 Excavata,这里以 Discoba 为代表(Jakobida、Heterolobosea 和 Euglenozoa)。ConJak 分析发现强烈的异常值集中在采样不足的谱系中,而 ConWin 对主要谱系中采样最不足的 Discoba 的分析检测到分散在各处的潜在不一致片段。使用 LG-gamma 模型对完整数据进行的系统发育分析支持 Discoba 姐妹场景(新生动物挖掘根),上升到 99-100% 的引导程序支持,并根据任一协议屏蔽数据。然而,使用两个特定站点 (CAT) 混合模型的分析产生了广泛不一致的结果和对缺失数据的惊人敏感性。neozoan-excavate root 认为 Amorphea 和 Diaphoretickes 彼此之间的关系比任何一个与 Discoba 的关系都更密切,这种基本关系应该不受其他分类单元的影响。
更新日期:2022-04-12
down
wechat
bug