当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A probabilistic version of Sankoff’s maximum parsimony algorithm
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2020-01-31 , DOI: 10.1142/s0219720020500043
Gábor Balogh 1 , Stephan H Bernhart 1 , Peter F Stadler 2, 3, 4, 5, 6 , Jana Schor 7
Affiliation  

The number of genes belonging to a multi-gene family usually varies substantially over their evolutionary history as a consequence of gene duplications and losses. A first step toward analyzing these histories in detail is the inference of the changes in copy number that take place along the individual edges of the underlying phylogenetic tree. The corresponding maximum parsimony minimizes the total number of changes along the edges of the species tree. Incorrectly determined numbers of family members however may influence the estimates drastically. We therefore augment the analysis by introducing a probabilistic model that also considers suboptimal assignments of changes. Technically, this amounts to a partition function variant of Sankoff’s parsimony algorithm. As a showcase application, we reanalyze the gain and loss patterns of metazoan microRNA families. As expected, the differences between the probabilistic and the parsimony method is moderate, in this limit of [Formula: see text], i.e. very little tolerance for deviations from parsimony, the total number of reconstructed changes is the same. However, we find that the partition function approach systematically predicts fewer gains and more loss events, showing that the data admit co-optimal solutions among which the parsimony approach selects biased representatives.

中文翻译:

Sankoff 最大简约算法的概率版本

由于基因复制和丢失,属于多基因家族的基因数量通常在其进化历史中发生很大变化。详细分析这些历史的第一步是推断沿着潜在系统发育树的各个边缘发生的拷贝数变化。相应的最大简约性使物种树边缘的变化总数最小化。然而,错误确定的家庭成员人数可能会极大地影响估计。因此,我们通过引入一个概率模型来增强分析,该模型还考虑了次优的更改分配。从技术上讲,这相当于 Sankoff 简约算法的分区函数变体。作为一个展示应用程序,我们重新分析了后生动物 microRNA 家族的得失模式。正如预期的那样,概率法和简约法之间的差异是中等的,在[公式:见文本]的这个限制内,即对简约偏差的容忍度非常小,重构变化的总数是相同的。然而,我们发现配分函数方法系统地预测更少的收益和更多的损失事件,表明数据允许共同最优的解决方案,其中简约方法选择有偏见的代表。
更新日期:2020-01-31
down
wechat
bug