当前位置: X-MOL 学术BMC Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability.
BMC Biology ( IF 4.4 ) Pub Date : 2020-03-02 , DOI: 10.1186/s12915-020-0748-z
Galo A Goig 1 , Silvia Blanco 2 , Alberto L Garcia-Basteiro 2, 3 , Iñaki Comas 1, 4
Affiliation  

BACKGROUND Contaminant DNA is a well-known confounding factor in molecular biology and in genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data commonly do not account for errors potentially introduced by contamination, which could lead to the wrong assessment of allele frequency both in basic and clinical research. RESULTS We used a taxonomic filter to remove contaminant reads from more than 4000 bacterial samples from 20 different studies and performed a comprehensive evaluation of the extent and impact of contaminant DNA in WGS. We found that contamination is pervasive and can introduce large biases in variant analysis. We showed that these biases can result in hundreds of false positive and negative SNPs, even for samples with slight contamination. Studies investigating complex biological traits from sequencing data can be completely biased if contamination is neglected during the bioinformatic analysis, and we demonstrate that removing contaminant reads with a taxonomic classifier permits more accurate variant calling. We used both real and simulated data to evaluate and implement reliable, contamination-aware analysis pipelines. CONCLUSION As sequencing technologies consolidate as precision tools that are increasingly adopted in the research and clinical context, our results urge for the implementation of contamination-aware analysis pipelines. Taxonomic classifiers are a powerful tool to implement such pipelines.

中文翻译:

细菌测序实验中的污染物DNA是错误的遗传变异性的主要来源。

背景技术污染物DNA是分子生物学和基因组库中众所周知的混杂因素。引人注目的是,用于全基因组测序(WGS)数据的分析工作流程通常无法解决污染引起的潜在错误,这可能导致在基础研究和临床研究中对等位基因频率的错误评估。结果我们使用分类过滤器从20个不同的研究中删除了4000多种细菌样品中的污染物读数,并对WGS中污染物DNA的程度和影响进行了全面评估。我们发现污染无处不在,并且会在变异分析中引入较大的偏差。我们表明,即使对于轻度污染的样品,这些偏差也可能导致数百种假阳性和阴性SNP。如果在生物信息学分析过程中忽略了污染,则从测序数据中研究复杂生物学特征的研究可能会完全偏颇,而且我们证明,使用分类学分类器去除污染物读数可以更准确地调用变体。我们使用真实和模拟数据来评估和实施可靠的,具有污染意识的分析管道。结论随着测序技术逐渐成为在研究和临床环境中越来越多地采用的精密工具,我们的结果敦促实施污染感知分析管道。分类分类器是实现此类管道的强大工具。并且我们证明了使用分类学分类器删除污染物读数可实现更准确的变体调用。我们使用真实和模拟数据来评估和实施可靠的,具有污染意识的分析管道。结论随着测序技术逐渐成为在研究和临床环境中越来越多地采用的精密工具,我们的结果敦促实施污染感知分析管道。分类分类器是实现此类管道的强大工具。并且我们证明了使用分类学分类器删除污染物读数可实现更准确的变体调用。我们使用真实和模拟数据来评估和实施可靠的,具有污染意识的分析管道。结论随着测序技术逐渐成为在研究和临床环境中越来越多地采用的精密工具,我们的研究结果强烈要求实施污染意识分析管线。分类分类器是实现此类管道的强大工具。我们的结果敦促实施污染意识分析管道。分类分类器是实现此类管道的强大工具。我们的结果敦促实施污染意识分析管道。分类分类器是实现此类管道的强大工具。
更新日期:2020-04-22
down
wechat
bug