当前位置: X-MOL 学术Microb. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generalizable characteristics of false-positive bacterial variant calls
Microbial Genomics ( IF 3.9 ) Pub Date : 2021-08-04 , DOI: 10.1099/mgen.0.000615
Stephen J Bush 1
Affiliation  

Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple principle that certain characteristics are disproportionately represented among the set of FP calls and that a user-chosen threshold can maximize the number detected. To provide guidance on this issue, this study empirically characterized all false SNP and indel calls made using real Illumina sequencing data from six disparate species and 166 variant-calling pipelines (the combination of 14 read aligners with up to 13 different variant callers, plus four ‘all-in-one’ pipelines). We did not seek to optimize filter thresholds but instead to draw attention to those filters of greatest efficacy and the pipelines to which they may most usefully be applied. In this respect, this study acts as a coda to our previous benchmarking evaluation of bacterial variant callers, and provides general recommendations for effective practice. The results suggest that, of the pipelines analysed in this study, the most straightforward way of minimizing false positives would simply be to use Snippy. We also find that a disproportionate number of false calls, irrespective of the variant-calling pipeline, are located in the vicinity of indels, and highlight this as an issue for future development.

中文翻译:

假阳性细菌变异调用的可概括特征

当变体调用因为没有方法没有错误时,最小化误报是一个关键问题。通常的做法是使用旨在区分真阳性 (TP) 和假阳性 (FP) 调用的硬过滤标准对变体调用文件 (VCF) 进行后处理。这些应用基于一个简单的原则,即某些特征在 FP 调用集中不成比例地表示,并且用户选择的阈值可以最大化检测到的数量。为了就这个问题提供指导,本研究对使用来自 6 个不同物种的真实 Illumina 测序数据和 166 个变体调用管道(14 个读取比对器与多达 13 个不同的变体调用器的组合,加上四个“一体式”管道)。我们并没有寻求优化过滤器阈值,而是将注意力吸引到那些最有效的过滤器以及它们可能最有用的应用管道。在这方面,这项研究作为我们之前对细菌变异调用者的基准评估的结尾,并为有效实践提供了一般性建议。结果表明,在本研究中分析的管道中,最小化误报的最直接方法就是使用 Snippy。我们还发现,与变体调用管道无关的错误调用数量不成比例,位于插入缺失附近,并强调这是未来发展的一个问题。这项研究是我们之前对细菌变异调用者的基准评估的结尾,并为有效实践提供了一般性建议。结果表明,在本研究中分析的管道中,最小化误报的最直接方法就是使用 Snippy。我们还发现,与变体调用管道无关的错误调用数量不成比例,位于插入缺失附近,并强调这是未来发展的一个问题。这项研究是我们之前对细菌变异调用者的基准评估的结尾,并为有效实践提供了一般性建议。结果表明,在本研究中分析的管道中,最小化误报的最直接方法就是使用 Snippy。我们还发现,与变体调用管道无关的错误调用数量不成比例,位于插入缺失附近,并强调这是未来发展的一个问题。
更新日期:2021-08-05
down
wechat
bug