当前位置: X-MOL 学术IEEE Trans. NanoBiosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning.
IEEE Transactions on NanoBioscience ( IF 3.9 ) Pub Date : 2020-05-08 , DOI: 10.1109/tnb.2020.2991302
Wei Wang , Qiqige Wuyun , Kevin J Liu

Statistical resampling methods are widely used for confidence interval placement and as a data perturbation technique for statistical inference and learning. An important assumption of popular resampling methods such as the standard bootstrap is that input observations are identically and independently distributed (i.i.d.). However, within the area of computational biology and bioinformatics, many different factors can contribute to intra-sequence dependence, such as recombination and other evolutionary processes governing sequence evolution. The SEquential RESampling (“SERES”) framework was previously proposed to relax the simplifying assumption of i.i.d. input observations. SERES resampling takes the form of random walks on an input of either aligned or unaligned biomolecular sequences. This study introduces the first application of SERES random walks on aligned sequence inputs and is also the first to demonstrate the utility of SERES as a data perturbation technique to yield improved statistical estimates. We focus on the classical problem of recombination-aware local genealogical inference. We show in a simulation study that coupling SERES resampling and re-estimation with recHMM, a hidden Markov model-based method, produces local genealogical inferences with consistent and often large improvements in terms of topological accuracy. We further evaluate method performance using empirical HIV genome sequence datasets.

中文翻译:

随机游走重采样在系统进化HMM推理和学习中的应用。

统计重采样方法广泛用于置信区间放置,以及作为统计推断和学习的数据摄动技术。流行的重采样方法(例如标准引导程序)的重要假设是输入观测值是相同且独立分布的(iid)。然而,在计算生物学和生物信息学领域,许多不同因素可导致序列内依赖性,例如重组和控制序列进化的其他进化过程。先前提出了顺序重采样(“ SERES”)框架来放宽对iid输入观察的简化假设。SERES重采样采取对序列相同或不对准的生物分子序列输入进行随机游走的形式。这项研究介绍了SERES随机游动在比对序列输入上的首次应用,同时也是第一个证明SERES作为数据摄动技术来产生改进的统计估计值的实用程序。我们关注重组感知的本地家谱推论的经典问题。我们在模拟研究中显示,将SERES重采样和重新估计与recHMM(一种基于隐马尔可夫模型的方法)耦合,可以产生局部族谱推断,并且在拓扑准确性方面具有一致的且通常是较大的改进。我们使用经验性HIV基因组序列数据集进一步评估方法的性能。我们关注重组感知的本地家谱推论的经典问题。我们在模拟研究中显示,将SERES重采样和重新估计与recHMM(一种基于隐马尔可夫模型的方法)耦合,可以产生局部族谱推断,并且在拓扑准确性方面具有一致的且通常是较大的改进。我们使用经验性HIV基因组序列数据集进一步评估方法的性能。我们关注于重组感知的局部家谱推论的经典问题。我们在模拟研究中显示,将SERES重采样和重新估计与recHMM(一种基于隐马尔可夫模型的方法)耦合,可以产生局部族谱推断,并且在拓扑准确性方面具有一致的且通常是较大的改进。我们使用经验性HIV基因组序列数据集进一步评估方法的性能。
更新日期:2020-07-03
down
wechat
bug