Heuristics for Genome Rearrangement Distance With Replicated Genes,IEEE/ACM Transactions on Computational Biology and Bioinformatics

当前位置： X-MOL 学术 › IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Heuristics for Genome Rearrangement Distance With Replicated Genes
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-07-07 , DOI: 10.1109/tcbb.2021.3095021
Gabriel Siqueira , Klairton Lima Brito , Ulisses Dias , Zanoni Dias

In comparative genomics, one goal is to find similarities between genomes of different organisms. Comparisons using genome features like genes, gene order, and regulatory sequences are carried out with this purpose in mind. Genome rearrangements are mutational events that affect large extensions of the genome. They are responsible for creating extant species with conserved genes in different positions across genomes. Close species — from an evolutionary point of view — tend to have the same set of genes or share most of them. When we consider gene order to compare two genomes, it is possible to use a parsimony criterion to estimate how close the species are. We are interested in the shortest sequence of genome rearrangements capable of transforming one genome into the other, which is named rearrangement distance. Reversal is one of the most studied genome rearrangements events. This event acts in a segment of the genome, inverting the position and the orientation of genes in it. Transposition is another widely studied event. This event swaps the position of two consecutive segments of the genome. When the genome has no gene repetition, a common approach is to map it as a permutation such that each element represents a conserved block. When genomes have replicated genes, this mapping is usually performed using strings. The number of replicas depends on the organisms being compared, but in many scenarios, it tends to be small. In this work, we study the rearrangement distance between genomes with replicated genes considering that the orientation of genes is unknown. We present four heuristics for the problem of genome rearrangement distance with replicated genes. We carry out experiments considering the exclusive use of the reversals or transpositions events, as well as the version in which both events are allowed. We developed a database of simulated genomes and compared our results with other algorithms from the literature. The experiments showed that our heuristics with more sophisticated rules presented a better performance than the known algorithms to estimate the evolutionary distance between genomes with replicated genes. In order to validate the application of our algorithms in real data, we construct a phylogenetic tree based on the distance provided by our algorithm and compare it with a know tree from the literature.

中文翻译：

复制基因的基因组重排距离的启发式方法

在比较基因组学中，目标之一是找到不同生物体基因组之间的相似性。考虑到这一目的，使用基因、基因顺序和调控序列等基因组特征进行比较。基因组重排是影响基因组大范围延伸的突变事件。它们负责创造在基因组不同位置具有保守基因的现存物种。从进化的角度来看，相近的物种往往拥有相同的基因组或共享其中的大部分基因。当我们考虑基因顺序来比较两个基因组时，可以使用简约标准来估计物种的接近程度。我们感兴趣的是能够将一个基因组转变为另一个基因组的最短基因组重排序列，这被称为重排距离。逆转是研究最多的基因组重排事件之一。该事件作用于基因组的一部分，反转其中基因的位置和方向。换位是另一个被广泛研究的事件。该事件交换了基因组两个连续片段的位置。当基因组没有基因重复时，一种常见的方法是将其映射为排列，使得每个元素代表一个保守块。当基因组具有复制基因时，通常使用字符串来执行这种映射。副本的数量取决于所比较的生物体，但在许多情况下，它往往很小。在这项工作中，考虑到基因的方向未知，我们研究了具有复制基因的基因组之间的重排距离。我们针对复制基因的基因组重排距离问题提出了四种启发式方法。我们进行实验时考虑了反转或换位事件的排他性使用，以及允许这两种事件的版本。我们开发了一个模拟基因组数据库，并将我们的结果与文献中的其他算法进行了比较。实验表明，我们的启发式算法具有更复杂的规则，在估计具有复制基因的基因组之间的进化距离方面比已知算法具有更好的性能。为了验证我们的算法在实际数据中的应用，我们根据我们的算法提供的距离构建了一个系统发育树，并将其与文献中的已知树进行比较。

更新日期：2021-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文