当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Set-theory based benchmarking of three different variant callers for targeted sequencing
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2021-01-07 , DOI: 10.1186/s12859-020-03926-3
Jose Arturo Molina-Mora , Mariela Solano-Vargas

Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality. We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set. Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.

中文翻译:

基于集合理论的三种不同变体调用方的基准,用于靶向测序

下一代测序(NGS)技术改善了遗传疾病的研究。由于对生物信息学管道的评估并不简单,因此NGS需要有效的策略来分析与临床情况下的决策至关重要有关的数据。根据全球基因组与健康联盟(GA4GH)的基准框架,我们实施了一种新的基于简单易用的基于集合论的方法,以使用金标准变异集和高置信度区域评估变异调用者。作为模型,我们使用了参考基因组NA12878的TruSight Cardio试剂盒测序数据。该靶向测序试剂盒用于鉴定与遗传性心脏病(ICC)有关的关键基因的变异,遗传性心脏病是一组发病率和死亡率较高的心血管疾病。我们实现并比较了三个变体调用管道(Isaac,Freebayes和VarScan)。使用我们的集理论方法的性能指标显示了高分辨率的管道,并显示:(1)所有三个管道的完美召回率均为1.000;(2)非常高精度的值,例如Freebayes为0.987,VarScan为0.928,Isaac为1.000与参考材料进行比较时,以及(3)所有情况下的ROC曲线分析AUC> 0.94。此外,在三个管道之间获得了显着差异。通常,结果表明这三个管道能够识别黄金标准数据集中的预期变体。我们使用设定理论计算指标的方法能够通过三个选定的管道识别预期的ICC相关变体,但结果完全取决于算法。
更新日期:2021-01-08
down
wechat
bug