当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interrogating genomic-scale data to resolve recalcitrant nodes in the Spider Tree of Life
Molecular Biology and Evolution ( IF 11.0 ) Pub Date : 2020-09-28 , DOI: 10.1093/molbev/msaa251
Siddharth Kulkarni 1, 2 , Robert J Kallal 2 , Hannah Wood 2 , Dimitar Dimitrov 3 , Gonzalo Giribet 4 , Gustavo Hormiga 1
Affiliation  

Genome-scale data sets are converging on robust, stable phylogenetic hypotheses for many lineages; however, some nodes have shown disagreement across classes of data. We use spiders (Araneae) as a system to identify the causes of incongruence in phylogenetic signal between three classes of data: exons (as in phylotranscriptomics), non-coding regions (included in ultraconserved elements [UCE] analyses), and a combination of both (as in UCE analyses). Gene orthologs, coded as amino acids and nucleotides (with and without third codon positions), were generated by querying published transcriptomes for UCEs, recovering 1,931 UCE loci (codingUCEs). We expected that congeners represented in the codingUCE and UCEs data would form clades in the presence of phylogenetic signal. Non-coding regions derived from UCE sequences were recovered to test the stability of relationships. Phylogenetic relationships resulting from all analyses were largely congruent. All nucleotide data sets from transcriptomes, UCEs, or a combination of both recovered similar topologies in contrast with results from transcriptomes analyzed as amino acids. Most relationships inferred from low occupancy data sets, containing several hundreds of loci, were congruent across Araneae, as opposed to high occupancy data matrices with fewer loci, which showed more variation. Furthermore, we found that low occupancy data sets analyzed as nucleotides (as is typical of UCE data sets) can result in more congruent relationships than high occupancy data sets analyzed as amino acids (as in phylotranscriptomics). Thus, omitting data, through amino acid translation or via retention of only high occupancy loci, may have a deleterious effect in phylogenetic reconstruction.

中文翻译:


询问基因组规模的数据以解决蜘蛛生命树中的顽固节点



基因组规模的数据集正在向许多谱系的稳健、稳定的系统发育假说靠拢;然而,一些节点在不同类别的数据上表现出不一致。我们使用蜘蛛(Araneae)作为系统来识别三类数据之间系统发育信号不一致的原因:外显子(如系统转录组学)、非编码区域(包括在超保守元素 [UCE] 分析中)以及两者(如 UCE 分析)。编码为氨基酸和核苷酸(有或没有第三个密码子位置)的基因直系同源物是通过查询已发表的转录组中的 UCE 生成的,恢复了 1,931 个 UCE 位点(编码 UCE )。我们预计编码UCE和UCE数据中代表的同系物将在存在系统发育信号的情况下形成进化枝。恢复源自 UCE 序列的非编码区以测试关系的稳定性。所有分析得出的系统发育关系基本一致。与作为氨基酸分析的转录组的结果相比,来自转录组、UCE 或两者组合的所有核苷酸数据集都恢复了相似的拓扑结构。从包含数百个基因座的低占用率数据集推断出的大多数关系在蜘蛛目中是一致的,而不是具有较少基因座的高占用率数据矩阵,后者显示出更多的变异。此外,我们发现,与分析为氨基酸的高占用率数据集(如系统转录组学)相比,分析为核苷酸的低占用率数据集(如典型的 UCE 数据集)可以产生更一致的关系。因此,通过氨基酸翻译或仅保留高占用位点来省略数据可能会对系统发育重建产生有害影响。
更新日期:2020-09-28
down
wechat
bug