当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2020-11-13 , DOI: 10.1186/s13015-020-00179-w
Milad Miladi 1 , Martin Raden 1 , Sebastian Will 2, 3 , Rolf Backofen 1, 4
Affiliation  

Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of $$O(n^6)$$ in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Here, we introduce a novel variant of Sankoff’s algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff’s algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA.

中文翻译:

快速准确的结构概率估计,用于 RNA 与马尔可夫链的同时比对和折叠

RNA 的同时比对和折叠 (SA&F) 是推断非编码 RNA 结构及其一般分析不可或缺的黄金标准。Sankoff 提出的原始算法在全能量模型中精确地解决了复杂度为 $$O(n^6)$$ 的理论问题。在过去的二十年中,人们提出了桑科夫算法的几种变体和改进,通过提出简化的能量模型或对预测的对齐施加限制来降低其极端复杂性。在这里,我们介绍了 Sankoff 算法的一种新颖变体,它可以协调 PMcomp 的简化,即从全能量模型转向更简单的基于碱基对的模型,同时具有基于循环的全能量模型的准确性。我们的模型不是根据无条件碱基对概率估计伪能量,而是根据条件碱基对概率计算能量,从而可以准确捕获遵循条件依赖性的结构概率。该模型催生了快速、高精度的新型算法 Pankov(受马尔可夫链启发的类似 Sankoff 的 RNA 同步对齐和折叠)。Pankov 受益于排除不可靠碱基配对的加速,而不损害桑科夫算法的基于循环的自由能模型。我们表明 Pankov 在折叠质量方面优于其前身 LocARNA 和 SPARSE,并且比 LocARNA 更快。
更新日期:2020-11-15
down
wechat
bug