当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads.
Journal of Computational Biology ( IF 1.7 ) Pub Date : 2020-08-04 , DOI: 10.1089/cmb.2019.0272
Katarzyna Górczak 1, 2 , Jürgen Claesen 1, 3 , Tomasz Burzykowski 1, 4
Affiliation  

RNA sequencing (RNA-seq) is widely used to study gene-, transcript-, or exon expression. To quantify the expression level, millions of short sequenced reads need to be mapped back to a reference genome or transcriptome. Read mapping makes it possible to find a location to which a read is identical or similar. Based upon this alignment, expression summaries, that is, read counts are generated. However, reads may be matched to multiple locations. Such ambiguously mapped reads are often ignored in the analysis, which is a potential loss of information and may cause bias in expression estimation. We present the general principles underlying multiread allocation and unbiased estimation of the expression level of genes, exons, or transcripts in the presence of multiple mapped reads. The underlying principles are derived from a theoretical concept that identifies important sources of information such as the number of uniquely mapped reads, the total target length, and the length of the shared target regions. We show with simulation studies that methods incorporating some or all of the aforementioned sources of information estimate the expression levels of genes, exons, and/or transcripts with a higher precision and accuracy than methods that do not use this information. We identify important sources of information that should be taken into account by methods that estimate the abundance of genes, exons, and/or transcripts to achieve good precision and accuracy.

中文翻译:

存在不明确的短序列读数时基因组目标丰度估计的概念框架。

RNA 测序 (RNA-seq) 广泛用于研究基因、转录本或外显子的表达。为了量化表达水平,需要将数百万个短序列读数映射回参考基因组或转录组。读取映射可以找到与读取相同或相似的位置。基于此比对,生成表达式摘要,即读取计数。然而,读取可能匹配到多个位置。这种模糊映射的读取在分析中经常被忽略,这是一种潜在的信息丢失,并可能导致表达估计的偏差。我们介绍了在存在多个映射读取的情况下,多读取分配和基因、外显子或转录本表达水平的无偏估计的一般原则。基本原理源自一个理论概念,该概念确定了重要的信息来源,例如唯一映射读取的数量、总目标长度和共享目标区域的长度。我们通过模拟研究表明,与不使用这些信息的方法相比,结合部分或全部上述信息来源的方法以更高的精度和准确度估计基因、外显子和/或转录本的表达水平。我们确定了重要的信息来源,这些来源应该通过估计基因、外显子和/或转录本丰度的方法加以考虑,以实现良好的精确度和准确度。我们通过模拟研究表明,与不使用这些信息的方法相比,结合部分或全部上述信息来源的方法以更高的精度和准确度估计基因、外显子和/或转录本的表达水平。我们确定了重要的信息来源,这些来源应该通过估计基因、外显子和/或转录本丰度的方法加以考虑,以实现良好的精确度和准确度。我们通过模拟研究表明,与不使用这些信息的方法相比,结合部分或全部上述信息来源的方法以更高的精度和准确度估计基因、外显子和/或转录本的表达水平。我们确定了重要的信息来源,这些来源应该通过估计基因、外显子和/或转录本丰度的方法加以考虑,以实现良好的精确度和准确度。
更新日期:2020-08-08
down
wechat
bug