当前位置: X-MOL 学术Microb. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel
Microbial Genomics ( IF 4.0 ) Pub Date : 2021-09-23 , DOI: 10.1099/mgen.0.000651
Geneviève Labbé 1 , Peter Kruczkiewicz 2 , James Robertson 1 , Philip Mabon 3 , Justin Schonfeld 1 , Daniel Kein 3 , Marisa A Rankin 1 , Matthew Gopez 3 , Darian Hole 3 , David Son 1 , Natalie Knox 3, 4 , Chad R Laing 5 , Kyrylo Bessonov 1 , Eduardo N Taboada 3 , Catherine Yoshida 3 , Kim Ziebell 1 , Anil Nichani 1 , Roger P Johnson 1 , Gary Van Domselaar 3, 5 , John H E Nash 6
Affiliation  

Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool’s output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity.

中文翻译:

使用 BioHansel 对克隆细菌病原体进行快速准确的 SNP 基因分型

分层基因分型方法可以深入了解细菌病原体的来源、地理和时间分布。以前已经开发了多个分层 SNP 基因分型方案,以便可以将新的分离物快速放置在预先计算的种群结构中,而无需为整个数据集重建系统发育树。然而,由于分析的复杂性和缺乏提供清晰和简单方法来解释结果的标准化工具,这种分类方法在常规公共卫生环境中的应用有限。BioHansel 工具的开发旨在为基于分层 SNP 的基因分型提供与生物体无关的工具。该工具使用基于 SNP 的基因分型方案识别分离的 k-mers,以区分全基因组测序 (WGS) 数据中的预定义谱系。BioHansel 使用 Aho-Corasick 算法在几秒钟内从组装的基因组或原始读取序列数据中键入分离物,而计算资源有限。这使得 BioHansel 非常适合依赖 WGS 方法监测细菌病原体的公共卫生机构使用。基因分型结果使用质量保证模块进行评估,该模块识别有问题的样本,例如低质量或受污染的数据集。使用现有的分层 SNP 方案 基因分型结果使用质量保证模块进行评估,该模块识别有问题的样本,例如低质量或受污染的数据集。使用现有的分层 SNP 方案 基因分型结果使用质量保证模块进行评估,该模块识别有问题的样本,例如低质量或受污染的数据集。使用现有的分层 SNP 方案 结核分枝杆菌 伤寒沙门氏菌,我们将使用基于 k-mer 的工具 BioHansel 和 SKA 获得的基因分型结果与使用金标准参考映射方法的生物体特异性工具 TBProfiler 和 genotyphi 获得的基因分型结果进行比较。我们表明,这些不同方法的基因分型结果完全一致,并且基于 k-mer 的工具明显更快。我们还测试了 BioHansel 质量保证模块检测谱系内污染的能力,并证明它是有效的,即使在遗传多样性低的人群中也是如此。我们使用约 8100 S的数据集展示了该工具的可扩展性 . Typhi 公共基因组,并提供地理分布的汇总结果作为工具输出的一部分。BioHansel 是一个开源 Python 3 应用程序,可在 PyPI 和 Conda 存储库中使用,并且作为来自公共 Galaxy Toolshed 的 Galaxy 工具。在公共卫生方面,BioHansel 能够对遗传多样性低的细菌病原体进行快速和高分辨率的分类。
更新日期:2021-09-24
down
wechat
bug