当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
Systematic Biology ( IF 6.1 ) Pub Date : 2020-07-12 , DOI: 10.1093/sysbio/syaa050
Nicola De Maio 1
Affiliation  

Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The "cumulative indel model" approximates realistic evolutionary indel dynamics using differential equations. "Adaptive banding" reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block (≈ 530kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods.

中文翻译:


累积插入缺失模型:快速准确的统计进化比对



序列比对对于系统发育和分子进化推断以及生物信息学和进化生物学的许多其他领域至关重要。不准确的对齐可能会导致大多数下游统计分析出现严重偏差。基于序列进化概率模型的统计比对通过用基于进化模型的概率替换启发式评分函数来解决这些问题。然而,基于分数的对齐器和固定对齐系统发育方法仍然比基于进化插入缺失模型的方法更普遍,这主要是由于计算方便。在这里,我提出了提高统计进化比对的准确性和速度的新技术。 “累积插入缺失模型”使用微分方程近似现实的进化插入缺失动力学。 “自适应条带”减少了大多数对齐算法的计算需求,而不需要预先了解发散水平或伪最优对齐。通过模拟,我证明这些方法可以实现快速、准确的成对对齐推断。此外,我还表明,通过这些方法,可以从人类和黑猩猩基因组之间的单个长同线性块(约 530kbp)对齐并推断进化参数。因此,累积插入缺失模型和自适应分带可以提高比对和系统发育方法的性能。
更新日期:2020-07-12
down
wechat
bug