当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination
Genome Biology ( IF 12.3 ) Pub Date : 2019-12-01 , DOI: 10.1186/s13059-019-1914-x
Arthur W Pightling 1 , James B Pettengill 1 , Yu Wang 1 , Hugh Rand 1 , Errol Strain 1
Affiliation  

Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720 Listeria monocytogenes, Salmonella enterica, and Escherichia coli short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases.

中文翻译:

细菌全基因组序列数据的种内污染比种间污染对聚类分析的影响更大

尽管假设细菌全基因组测序中的污染会导致错误,但污染对聚类分析(例如单核苷酸多态性发现、系统发育学和多位点测序分型)的影响尚未量化。通过开发和分析 720 个单核细胞增生李斯特菌、肠沙门氏菌和大肠杆菌短读数据集,我们证明了物种内污染会导致混淆聚类分析的错误,而物种间污染通常不会。污染读数映射到参考文献或在组装过程中并入嵌合序列是这些错误的来源。公共序列数据库中存在足以影响聚类分析的污染。
更新日期:2019-12-01
down
wechat
bug