当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmarking variant callers in next-generation and third-generation sequencing analysis.
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2020-07-23 , DOI: 10.1093/bib/bbaa148
Surui Pei 1 , Tao Liu 2 , Xue Ren 2 , Weizhong Li 3 , Chongjian Chen 2 , Zhi Xie 4
Affiliation  

DNA variants represent an important source of genetic variations among individuals. Next- generation sequencing (NGS) is the most popular technology for genome-wide variant calling. Third-generation sequencing (TGS) has also recently been used in genetic studies. Although many variant callers are available, no single caller can call both types of variants on NGS or TGS data with high sensitivity and specificity. In this study, we systematically evaluated 11 variant callers on 12 NGS and TGS datasets. For germline variant calling, we tested DNAseq and DNAscope modes from Sentieon, HaplotypeCaller mode from GATK and WGS mode from DeepVariant. All the four callers had comparable performance on NGS data and 30× coverage of WGS data was recommended. For germline variant calling on TGS data, we tested DNAseq mode from Sentieon, HaplotypeCaller mode from GATK and PACBIO mode from DeepVariant. All the three callers had similar performance in SNP calling, while DeepVariant outperformed the others in InDel calling. TGS detected more variants than NGS, particularly in complex and repetitive regions. For somatic variant calling on NGS, we tested TNscope and TNseq modes from Sentieon, MuTect2 mode from GATK, NeuSomatic, VarScan2, and Strelka2. TNscope and Mutect2 outperformed the other callers. A higher proportion of tumor sample purity (from 10 to 20%) significantly increased the recall value of calling. Finally, computational costs of the callers were compared and Sentieon required the least computational cost. These results suggest that careful selection of a tool and parameters is needed for accurate SNP or InDel calling under different scenarios.

中文翻译:

在下一代和第三代测序分析中对变异检测器进行基准测试。

DNA 变异代表个体间遗传变异的重要来源。新一代测序 (NGS) 是最流行的全基因组变异检测技术。最近,第三代测序 (TGS) 也被用于遗传研究。尽管有许多变异调用者可用,但没有一个调用者可以在 NGS 或 TGS 数据上以高灵敏度和特异性调用这两种类型的变异。在这项研究中,我们系统地评估了 12 个 NGS 和 TGS 数据集上的 11 个变异调用者。对于种系变异调用,我们测试了 Sentieon 的 DNAseq 和 DNAscope 模式、GATK 的 HaplotypeCaller 模式和 DeepVariant 的 WGS 模式。所有四个调用者在 NGS 数据上都有可比的性能,建议对 WGS 数据进行 30 倍的覆盖。对于 TGS 数据的种系变异调用,我们测试了 Sentieon 的 DNAseq 模式,来自 GATK 的 HaplotypeCaller 模式和来自 DeepVariant 的 PACBIO 模式。所有三个调用者在 SNP 调用中都有相似的表现,而 DeepVariant 在 InDel 调用中的表现优于其他人。TGS 检测到的变异比 NGS 多,尤其是在复杂和重复区域。对于 NGS 上的体细胞变异调用,我们测试了 Sentieon 的 TNscope 和 TNseq 模式、GATK、NeuSomatic、VarScan2 和 Strelka2 的 MuTect2 模式。TNscope 和 Mutect2 的表现优于其他调用者。更高比例的肿瘤样本纯度(从 10% 到 20%)显着增加了调用的召回值。最后,比较调用者的计算成本,Sentieon 需要最少的计算成本。这些结果表明,需要仔细选择工具和参数,才能在不同场景下实现准确的 SNP 或 InDel 调用。
更新日期:2020-07-23
down
wechat
bug