当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ARPIP: Ancestral sequence Reconstruction with insertions and deletions under the Poisson Indel Process
Systematic Biology ( IF 6.5 ) Pub Date : 2022-07-22 , DOI: 10.1093/sysbio/syac050
Gholamhossein Jowkar 1, 2, 3 , Jūlija Pečerska 1, 2 , Massimo Maiolo 1, 2, 4 , Manuel Gil 1, 2 , Maria Anisimova 1, 2
Affiliation  

Modern phylogenetic methods allow inference of ancestral molecular sequences given an alignment and phylogeny relating present day sequences. This provides insight into the evolutionary history of molecules, helping to understand gene function and to study biological processes such as adaptation and convergent evolution across a variety of applications. Here we propose a dynamic programming algorithm for fast joint likelihood-based reconstruction of ancestral sequences under the Poisson Indel Process (PIP). Unlike previous approaches, our method, named ARPIP, enables the reconstruction with insertions and deletions based on an explicit indel model. Consequently, inferred indel events have an explicit biological interpretation. Likelihood computation is achieved in linear time with respect to the number of sequences. Our method consists of two steps, namely finding the most probable indel points and reconstructing ancestral sequences. First, we find the most likely indel points and prune the phylogeny to reflect the insertion and deletion events per site. Second, we infer the ancestral states on the pruned subtree in a manner similar to FastML. We applied ARPIP on simulated datasets and on real data from the Betacoronavirus genus. ARPIP reconstructs both the indel events and substitutions with a high degree of accuracy. Our method fares well when compared to established state-of-the-art methods such as FastML and PAML. Moreover, the method can be extended to explore both optimal and suboptimal reconstructions, include rate heterogeneity through time and more. We believe it will expand the range of novel applications of ancestral sequence reconstruction.

中文翻译:

ARPIP:泊松插入删除过程下的祖先序列插入和删除重建

现代系统发育方法允许在给定与当今序列相关的比对和系统发育的情况下推断祖先分子序列。这提供了对分子进化历史的深入了解,有助于了解基因功能并研究生物过程,例如跨各种应用的适应和趋同进化。在这里,我们提出了一种动态规划算法,用于在泊松因德尔过程(PIP)下快速基于联合似然的祖先序列重建。与以前的方法不同,我们的方法名为 ARPIP,能够基于显式插入和删除模型进行插入和删除重建。因此,推断的插入缺失事件具有明确的生物学解释。似然计算是在相对于序列数量的线性时间内实现的。我们的方法由两个步骤组成,即找到最可能的插入缺失点和重建祖先序列。首先,我们找到最可能的插入缺失点并修剪系统发育以反映每个位点的插入和删除事件。其次,我们以类似于 FastML 的方式推断修剪子树上的祖先状态。我们将 ARPIP 应用于模拟数据集和 Betacoronavirus 属的真实数据。ARPIP 以高精度重建插入缺失事件和替换。与 FastML 和 PAML 等既定的最先进方法相比,我们的方法表现良好。此外,该方法可以扩展以探索最优和次优重建,包括随时间变化的速率异质性等。我们相信它将扩大祖先序列重建的新应用范围。即找到最可能的插入缺失点并重建祖先序列。首先,我们找到最可能的插入缺失点并修剪系统发育以反映每个位点的插入和删除事件。其次,我们以类似于 FastML 的方式推断修剪子树上的祖先状态。我们将 ARPIP 应用于模拟数据集和 Betacoronavirus 属的真实数据。ARPIP 以高精度重建插入缺失事件和替换。与 FastML 和 PAML 等既定的最先进方法相比,我们的方法表现良好。此外,该方法可以扩展以探索最优和次优重建,包括随时间变化的速率异质性等。我们相信它将扩大祖先序列重建的新应用范围。即找到最可能的插入缺失点并重建祖先序列。首先,我们找到最可能的插入缺失点并修剪系统发育以反映每个位点的插入和删除事件。其次,我们以类似于 FastML 的方式推断修剪子树上的祖先状态。我们将 ARPIP 应用于模拟数据集和 Betacoronavirus 属的真实数据。ARPIP 以高精度重建插入缺失事件和替换。与 FastML 和 PAML 等既定的最先进方法相比,我们的方法表现良好。此外,该方法可以扩展以探索最优和次优重建,包括随时间变化的速率异质性等。我们相信它将扩大祖先序列重建的新应用范围。
更新日期:2022-07-22
down
wechat
bug