当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmarking of computational error-correction methods for next-generation sequencing data
Genome Biology ( IF 10.1 ) Pub Date : 2020-03-17 , DOI: 10.1186/s13059-020-01988-3
Keith Mitchell 1 , Jaqueline J Brito 2 , Igor Mandric 1, 3 , Qiaozhen Wu 4 , Sergey Knyazev 3 , Sei Chang 1 , Lana S Martin 2 , Aaron Karlsberg 2 , Ekaterina Gerasimov 3 , Russell Littman 5 , Brian L Hill 1 , Nicholas C Wu 6 , Harry Taegyun Yang 1 , Kevin Hsieh 1 , Linus Chen 1 , Eli Littman 1 , Taylor Shabani 1 , German Enik 1 , Douglas Yao 7 , Ren Sun 8 , Jan Schroeder 9 , Eleazar Eskin 1 , Alex Zelikovsky 3, 10 , Pavel Skums 3 , Mihai Pop 11 , Serghei Mangul 2
Affiliation  

Background Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. Results In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. Conclusions In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

中文翻译:


下一代测序数据计算纠错方法的基准测试



背景下一代测序的最新进展迅速提高了我们以前所未有的规模研究基因组材料的能力。尽管测序技术取得了显着进步,但数据中存在的错误仍然有可能混淆下游分析并限制测序技术在临床工具中的适用性。计算错误校正有望消除测序错误,但错误校正算法的相对准确性仍然未知。结果在本文中,我们评估了纠错算法修复包含不同程度异质性的不同类型数据集的错误的能力。我们强调了不同生物学领域(包括免疫基因组学和病毒学)的计算纠错技术的优点和局限性。为了证明我们技术的有效性,我们应用基于 UMI 的高保真测序协议来消除模拟数据和原始读数中的测序错误。然后我们对纠错方法进行现实评估。结论就准确性而言,我们发现不同类型的数据集的方法性能差异很大,没有一种方法在所有类型的检查数据上都表现最佳。最后,我们还确定了在精度和灵敏度之间提供良好平衡的技术。
更新日期:2020-03-17
down
wechat
bug