当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
Genome Research ( IF 7 ) Pub Date : 2023-06-01 , DOI: 10.1101/gr.277395.122
Bin Zhao 1 , John A Lees 2, 3 , Hongjin Wu 1 , Chao Yang 4 , Daniel Falush 4
Affiliation  

Bacterial genome data are accumulating at an unprecedented speed due to the routine use of sequencing in clinical diagnoses, public health surveillance, and population genetics studies. Genealogical reconstruction is fundamental to many of these uses; however, inferring genealogy from large-scale genome data sets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free method, PopPUNK, to increase its flexibility and interpretability across data sets. Our method, iterative-PopPUNK, rapidly produces multiple consistent cluster assignments across a range of sequence identities. By constructing a partially resolved genealogical tree with respect to these clusters, users can select a resolution most appropriate for their needs. We showed the accuracy of clusters at all levels of similarity and genealogical inference of iterative-PopPUNK based on simulated data and obtained phylogenetically concordant results in real data sets from seven bacterial species. Using two example sets of Escherichia/Shigella and Vibrio parahaemolyticus genomes, we show that iterative-PopPUNK can achieve cluster resolutions ranging from phylogroup down to sequence typing (ST). The iterative-PopPUNK algorithm is implemented in the “PopPUNK_iterate” program, available as part of the PopPUNK package.

中文翻译:

使用迭代 PopPUNK 进行谱系推理和更灵活的序列聚类

由于临床诊断、公共卫生监测和群体遗传学研究中常规使用测序,细菌基因组数据正在以前所未有的速度积累。谱系重建对于许多这些用途来说都是基础。然而,快速、准确、灵活地从大规模基因组数据集中推断家谱仍然是一个挑战。在这里,我们扩展了一种无需对齐和注释的方法 PopPUNK,以提高其跨数据集的灵活性和可解释性。我们的迭代 PopPUNK 方法可以在一系列序列身份中快速生成多个一致的簇分配。通过针对这些集群构建部分解析的谱系树,用户可以选择最适合其需求的解析。我们基于模拟数据展示了迭代 PopPUNK 的所有相似性水平上聚类的准确性和谱系推断,并在七个细菌物种的真实数据集中获得了系统发育一致的结果。使用埃希氏菌/志贺氏菌副溶血弧菌基因组的两个示例集,我们表明迭代 PopPUNK 可以实现从系统群到序列分型 (ST) 的聚类分辨率。迭代 PopPUNK 算法在“PopPUNK_iterate”程序中实现,该程序作为 PopPUNK 包的一部分提供。
更新日期:2023-06-01
down
wechat
bug