当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The distance and median problems in the single-cut-or-join model with single-gene duplications.
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2020-05-04 , DOI: 10.1186/s13015-020-00169-y
Aniket C Mane 1 , Manuel Lafond 2 , Pedro C Feijao 3 , Cedric Chauve 1
Affiliation  

Background In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. Results We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. Conclusion Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances.

中文翻译:

具有单基因重复的单切割或连接模型中的距离和中值问题。

背景在基因组重排算法领域,解释基因复制的模型经常导致难题。例如,虽然在大多数无重复模型中计算成对距离是易于处理的,但对于这些模型的大多数扩展,考虑到重复基因,问题是 NP 完全的。此外,涉及两个以上基因组的问题,例如基因组中位数和 Small Parsimony 问题,对于大多数无重复模型来说都是难以解决的,除了一些例外,例如单切割或连接 (SCJ) 模型。结果 我们在从祖先基因组定向进化到后代基因组的背景下,引入了一种解释重复基因的 SCJ 距离变体,其中祖先基因与其后代之间的直系同源关系是已知的。我们的模型包括两种复制机制:单基因串联重复和单基因环状染色体的产生。我们证明,在该模型中,可以在线性时间内完成根据 SCJ 和单基因复制事件计算有向距离和简约进化场景。我们还表明,对于这个距离,有向中值问题是易于处理的,而有根中值问题(我们假设给定基因组之一是中值的祖先)是 NP 完全问题。我们还描述了一个整数线性规划来解决这个问题。我们在模拟数据上评估有向距离和有根中值算法。结论我们的结果提供了一个简单的基因组重排模型,扩展了 SCJ 模型以解释单基因重复,为此我们证明了易处理性和硬度结果的混合。对于 NP 完全有根中值问题,我们设计了一个简单的整数线性规划。我们针对有向距离和中值问题的这些算法的公开实现允许在大型实例上有效地解决这些问题。
更新日期:2020-05-04
down
wechat
bug