当前位置: X-MOL 学术Microb. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Read trimming has minimal effect on bacterial SNP-calling accuracy
Microbial Genomics ( IF 4.0 ) Pub Date : 2020-12-01 , DOI: 10.1099/mgen.0.000434
Stephen J Bush 1
Affiliation  

Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as ‘trimming’. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from Escherichia coli , Mycobacterium tuberculosis and Staphylococcus aureus were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.

中文翻译:

读取修剪对细菌 SNP 识别准确度的影响最小

读取对齐是许多执行变体调用的分析管道的核心步骤。为了减少错误,通常的做法是对原始测序读数进行预处理,以去除低质量的碱基和残留的接头污染,这一过程统称为“修剪”。修剪被广泛认为可以提高变异调用的准确性,尽管对其影响的系统评估相对较少,并且对其功效没有明确的共识。随着测序数据集数量和大小的增加,值得重新评估收益不明确的计算操作,特别是当许多分析的范围现在通常包含数千个样本时,增加了所需的时间和成本。使用一组精选的 17 个革兰氏阴性细菌基因组,fastp、Trim Galore 和 Trimmomatic),每个都使用了一系列严格性,关于三个细菌 SNP 调用管道的准确性和完整性。结果发现,即使在本研究中使用性能最高的预处理器fastp时,读取修整仅使 SNP 调用准确度的提高很小且在统计上无足轻重。为了扩展这些发现,来自大肠杆菌结核分枝杆菌金黄色葡萄球菌的6500 多个公开存档的测序数据集 使用通用分析管道重新分析。在所有样本中调用的大约 1.25 亿个 SNP 和 125 万个插入缺失中,分别在 98.8% 和 91.9% 的案例中调用了相同的碱基,无论使用原始读数还是修剪读数。尽管如此,修剪后混合调用的比例(即 <100% 的读数支持变异等位基因的调用;被认为是假阳性的代表)在修剪后显着降低,这表明虽然修剪很少改变变异碱基集,但它可以影响支持每次调用的读取比例。得出的结论是,读取质量和适配器修整对 SNP 调用管道增加的价值相对较小,并且可能只有在 SNP 调用的绝对数量或错误调用率的微小差异至关重要时才有必要。关于修剪 indel-calling 管道的效用,可以得出大致相似的结论。可能是因为担心否则通常会产生负面后果,因此在变异调用之前仍会定期执行读取修剪。虽然从历史上看可能是这样,但本研究中的数据表明,阅读修剪并不总是实际必要的。
更新日期:2020-12-22
down
wechat
bug