当前位置: X-MOL 学术bioRxiv. Evol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-phylogenetic identification of co-evolving genes for reconstructing the archaeal Tree of Life
bioRxiv - Evolutionary Biology Pub Date : 2021-01-13 , DOI: 10.1101/2020.10.16.343293
L. Thibério Rangel , Shannon M. Soucy , João C. Setubal , Johann Peter Gogarten , Gregory P. Fournier

Assessing the phylogenetic compatibility between individual gene families is a crucial and often computationally demanding step in many phylogenomics analyses. Here we describe the Evolutionary Similarity Index (IES) to assess shared evolution between gene families using a weighted Orthogonal Distance Regression applied to sequence distances. This approach allows for straightforward pairing of paralogs between co-evolving gene families without resorting to multiple tests, or a priori assumptions of molecular interactions between protein products from assessed genes. The utilization of pairwise distance matrices, while less informative than phylogenetic trees, circumvents error-prone comparisons between trees whose topologies are inherently uncertain. Analyses of simulated gene family evolution datasets showed that IES was more accurate and less susceptible to noise than popular tree-based methods (Robinson-Foulds and geodesic distance) for assessing evolutionary signal compatibility, since it bypasses phylogenetic reconstruction and its inherent uncertainty. Applying IES to a real dataset of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. Four of these clusters included genes with a taxonomic distribution across all archaeal phyla, while other clusters included a subset of taxa that do not map to generally accepted archaeal clades, indicating possible shared horizontal transfers by clustered gene families. We identify one strongly connected set of 62 genes from the same cluster, occurring as both single-copy and multiple homologs per genome, with compatible phylogenetic reconstructions closely matching previously published species trees for Archaea. An IES implementation is available at https://github.com/lthiberiol/evolSimIndex.

中文翻译:

重建进化古生物树的共同进化基因的非系统发育鉴定

在许多系统基因组学分析中,评估单个基因家族之间的系统发育相容性是至关重要的且通常是计算上需要的步骤。在这里,我们描述了进化相似性指数(I ES),使用应用于序列距离的加权正交距离回归评估基因家族之间的共享进化。这种方法可以使共同进化的基因家族之间的旁系同源物直接配对,而无需借助多重测试,也无需先验假设来自评估基因的蛋白质产物之间的分子相互作用。成对距离矩阵的利用虽然不如系统发育树提供更多信息,但却规避了拓扑本身固有不确定性的树之间易于出错的比较。模拟基因家族进化数据集的分析表明,我ES与用于评估进化信号兼容性的流行基于树的方法(Robinson-Foulds和测地距离)相比,这种方法更准确且不易受到噪声的干扰,因为它绕过了系统发育重建及其固有的不确定性。应用我ES根据来自42个古细菌基因组的1,322个基因的真实数据集,确定了八个具有重要进化趋势的基因家族簇。这些聚类中有四个包含在所有古细菌门中均具有分类学分布的基因,而其他聚类包括未映射至普遍接受的古细菌进化枝的分类单元子集,表明聚类基因家族可能共享水平转移。我们从同一集群中识别出一组紧密相连的62个基因,每个基因组均以单拷贝和多个同源物的形式出现,其相容的系统发育重构与古细菌的先前发表的物种树非常匹配。在https://github.com/lthiberiol/evolSimIndex上可以找到I ES实现。
更新日期:2021-01-13
down
wechat
bug