当前位置: X-MOL 学术J. Math. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models.
Journal of Mathematical Biology ( IF 1.9 ) Pub Date : 2020-02-15 , DOI: 10.1007/s00285-019-01465-x
Cedric Chauve 1, 2, 3 , Yann Ponty 4 , Michael Wallner 2, 5
Affiliation  

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.

中文翻译:

在重复损失和重复损失转移模型中计数和采样基因家族进化史。

给定一组以物种树表示进化的物种,基因家族是从单个祖先基因进化而来的一组基因。基因家族通过各种机制沿着物种树的分支进化,包括但不限于物种形成(基因式),基因重复(基因式),基因丢失(基因式)。文本])和水平基因转移([公式:请参见文本])。代表物种树约束的基因家族进化的基因树的重建是系统发育组学中的重要问题。但是,与仅考虑物种形成和不完整的谱系分类事件的多物种联合进化模型不同,对于解释基因重复的基因家族史的搜索空间知之甚少,基因损失和水平基因转移([公式:参见文本]模型)。在这项工作中,我们介绍了进化历史的概念,该概念定义为描述基因家族进化的二元有序根树,并受[公式:模型]模型中的物种树约束。我们提供正式的语法,描述与给定物种树兼容的所有进化历史的集合,无论其是排名还是不排名。这些语法允许我们使用解析组合或动态编程来有效地计算给定大小的历史记录的数量,并在均匀分布下生成给定大小的随机历史记录。我们应用这些工具来获得关于两种树,根毛虫和完整二叉树的基因家族史数量的精确渐近性,以及对不超过25个大小的随机树的历史数量指数增长因子范围的估计。我们的结果表明,包括水平基因转移在内的进化历史数量急剧增加。我们还表明,在排序的树种中,[公式:模型]模型中的进化历史数几乎与树种拓扑结构无关。这些结果为开发预测和解的集成方法奠定了坚实的基础。[公式:模型]模型中进化历史的数量几乎与物种树的拓扑结构无关。这些结果为开发预测和解的集成方法奠定了坚实的基础。[公式:模型]模型中进化历史的数量几乎与物种树的拓扑结构无关。这些结果为开发预测和解的集成方法奠定了坚实的基础。
更新日期:2020-04-16
down
wechat
bug