当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing
Current Bioinformatics ( IF 4 ) Pub Date : 2021-01-31 , DOI: 10.2174/1574893615999200614171832
Muhammad Tahir 1 , Muhammad Sardaraz 1 , Zahid Mehmood 2 , Muhammad Saud Khan 1
Affiliation  

Aims: To assess the error profile in NGS data, generated from high throughput sequencing machines.

Background: Short-read sequencing data from Next Generation Sequencing (NGS) are currently being generated by a number of research projects. Depicting the errors produced by NGS platforms and expressing accurate genetic variation from reads are two inter-dependent phases. It has high significance in various analyses, such as genome sequence assembly, SNPs calling, evolutionary studies, and haplotype inference. The systematic and random errors show incidence profile for each of the sequencing platforms i.e. Illumina sequencing, Pacific Biosciences, 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Ion Torrent sequencing, and Oxford Nanopore sequencing. Advances in NGS deliver galactic data with the addition of errors. Some ratio of these errors may emulate genuine true biological signals i.e., mutation, and may subsequently negate the results. Various independent applications have been proposed to correct the sequencing errors. Systematic analysis of these algorithms shows that state-of-the-art models are missing.

Objective: In this paper, an effcient error estimation computational model called ESREEM is proposed to assess the error rates in NGS data.

Methods: The proposed model prospects the analysis that there exists a true linear regression association between the number of reads containing errors and the number of reads sequenced. The model is based on a probabilistic error model integrated with the Hidden Markov Model (HMM).

Results: The proposed model is evaluated on several benchmark datasets and the results obtained are compared with state-of-the-art algorithms.

Conclusion: Experimental results analyses show that the proposed model efficiently estimates errors and runs in less time as compared to others.



中文翻译:

ESREEM:下一代基因组测序的高效短读错误估计计算模型

目的:评估由高通量测序仪生成的NGS数据中的错误情况。

背景:目前,许多研究项目正在生成来自下一代测序(NGS)的短读测序数据。描述NGS平台产生的错误并从阅读中表达准确的遗传变异是两个相互依赖的阶段。它在各种分析中具有重要意义,例如基因组序列装配,SNP调用,进化研究和单倍型推断。系统错误和随机错误显示了每个测序平台的发生情况,即Illumina测序,Pacific Biosciences,454焦磷酸测序,Complete Genomics DNA纳米球测序,离子洪流测序和牛津纳米孔测序。NGS的进步提供了银河系数据,同时增加了错误。这些错误的某些比例可能会模仿真正的真实生物学信号,例如突变,并且随后可能会否定结果。已经提出了各种独立的应用来纠正测序错误。对这些算法的系统分析表明,缺少最新模型。

目的:本文提出了一种有效的误差估计计算模型,称为ESREEM,以评估NGS数据中的误差率。

方法:提出的模型进行了分析,发现包含错误的读取数与测序的读取数之间存在真正的线性回归关联。该模型基于与隐马尔可夫模型(HMM)集成的概率误差模型。

结果:在几个基准数据集上对提出的模型进行了评估,并将获得的结果与最新算法进行了比较。

结论:实验结果分析表明,与其他模型相比,该模型可以有效地估计误差,并且运行时间更少。

更新日期:2021-01-31
down
wechat
bug