当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model
BMC Genomics ( IF 4.4 ) Pub Date : 2020-11-18 , DOI: 10.1186/s12864-020-07008-9
Jiaqi Liu , Jiayin Wang , Xiao Xiao , Xin Lai , Daocheng Dai , Xuanping Zhang , Xiaoyan Zhu , Zhongmeng Zhao , Juan Wang , Zhimin Li

The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages. In this article, we propose a error correction method, named QIHC. QIHC is a hybrid correction method, which needs both the next generation and third generation sequencing data. QIHC greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, QIHC established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of QIHC, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods. To verify the performance of our method, we selected Canu and Jabba to compare with QIHC in several aspects. As a hybrid correction method, we first conducted a groups of experiments under different coverages of the next-generation sequencing data. QIHC is far ahead of Jabba on accuracy. Meanwhile, we varied the coverages of the third generation sequencing data and compared performances again among Canu, Jabba and QIHC. QIHC outperforms the other two methods on accuracy of both correcting the sequencing errors and identifying the heterozygous sites, especially at low coverage. We carried out a comparison analysis between Canu and QIHC on the different error rates of the third generation sequencing data. QIHC still performs better. Therefore, QIHC is superior to the existing error correction methods when heterozygous sites exist.

中文翻译:

综合概率模型考虑杂合度变异的混合校正方法

具有更长阅读长度的第三代测序技术的出现,与下一代测序技术相比已显示出巨大的进步,并极大地促进了生物学研究。但是,第三代测序数据具有较高的测序错误率,这不可避免地会影响下游分析。尽管这些年来排序错误的问题一直在改善,但是以高排序错误产生大量数据,如果丢弃它们会造成巨大的浪费。因此,第三代测序数据的纠错尤为重要。现有的纠错方法在二倍体和多倍体生物中普遍存在的杂合位点的性能较差。因此,对于杂合基因座,尤其是在低覆盖率时,缺少纠错算法。在本文中,我们提出了一种错误校正方法,称为QIHC。QIHC是一种混合校正方法,需要下一代和第三代测序数据。QIHC大大提高了从测序错误中鉴定杂合位点的敏感性,从而提高了错误校正的准确性。为此,QIHC建立了一套基于贝叶斯分类器的概率模型,以估计位点的杂合性,并通过计算后验概率进行判断。所提出的方法由三个模块组成,这三个模块分别产生一个伪参考序列,获得阅读的比对,估计位点的杂合性,并校正包含它们的阅读。最后一个模块是QIHC的核心模块,旨在适合杂合位点的多个病例的计算。其他两个模块支持将读取映射映射到伪参考序列,从而以某种方式克服了现有纠错方法采用的多重映射效率低下的问题。为了验证我们方法的性能,我们选择了Canu和Jabba在几个方面与QIHC进行了比较。作为一种混合校正方法,我们首先在下一代测序数据的不同覆盖范围内进行了一组实验。QIHC在准确性方面远远领先于Jabba。同时,我们改变了第三代测序数据的覆盖范围,并再次比较了Canu,Jabba和QIHC之间的性能。QIHC在校正测序错误和识别杂合位点的准确性上均优于其他两种方法,尤其是在低覆盖率的情况下。我们在Canu和QIHC之间对第三代测序数据的不同错误率进行了比较分析。QIHC的表现仍然更好。因此,当存在杂合位点时,QIHC优于现有的纠错方法。
更新日期:2020-11-19
down
wechat
bug