当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast phylogenetic inference from typing data.
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2018-02-15 , DOI: 10.1186/s13015-017-0119-7
João A Carriço 1 , Maxime Crochemore 2 , Alexandre P Francisco 3, 4 , Solon P Pissis 2 , Bruno Ribeiro-Gonçalves 1 , Cátia Vaz 3, 5
Affiliation  

BACKGROUND Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolutionary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. RESULTS We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.

中文翻译:

从输入数据中快速进行系统发育推断。

背景技术微生物分型方法通常用于研究细菌菌株的相关性。基于序列的分型方法是流行病学监测的黄金标准,因为序列和等位基因图谱数据具有固有的可移植性、快速的分析时间以及它们为菌株或克隆创建通用命名法的能力。这导致了几种新方法的开发,并为许多微生物物种提供了几个数据库。随着高通量测序的主流使用,这些数据库中积累的数据量巨大,存储了数千个不同的配置文件。另一方面,计算一组分型谱或分类群之间的遗传进化距离支配了许多系统发育推断方法的运行时间。还需要注意的是,即使是间接的,大多数遗传进化距离定义都依赖于计算序列或配置文件之间的成对汉明距离。结果 我们在这里提出了一种平均情况线性时间算法来计算给定汉明距离阈值下一组分类群之间的成对汉明距离。本文包括关于所提出算法的理论分析和广泛的实验结果。我们进一步展示了该算法如何成功地集成到众所周知的系统发育推断方法中,以及如何使用它来加速在大型类型数据库中查询本地系统发育模式。结果 我们在这里提出了一种平均情况线性时间算法来计算给定汉明距离阈值下一组分类群之间的成对汉明距离。本文包括关于所提出算法的理论分析和广泛的实验结果。我们进一步展示了该算法如何成功地集成到众所周知的系统发育推断方法中,以及如何使用它来加速在大型类型数据库中查询本地系统发育模式。结果 我们在这里提出了一种平均情况线性时间算法来计算给定汉明距离阈值下一组分类群之间的成对汉明距离。本文包括关于所提出算法的理论分析和广泛的实验结果。我们进一步展示了该算法如何成功地集成到众所周知的系统发育推断方法中,以及如何使用它来加速在大型类型数据库中查询本地系统发育模式。
更新日期:2019-11-01
down
wechat
bug