当前位置: X-MOL 学术Ann. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effect of high variation in transcript expression on identifying differentially expressed genes in RNA-seq analysis
Annals of Human Genetics ( IF 1.0 ) Pub Date : 2021-08-03 , DOI: 10.1111/ahg.12441
Weitong Cui 1 , Huaru Xue 1 , Yifan Geng 1, 2 , Jing Zhang 1 , Yajun Liang 1 , Xuewen Tian 3 , Qinglu Wang 1, 3
Affiliation  

Great efforts have been made on the algorithms that deal with RNA-seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p-value for filtering differentially expressed genes (DEGs). It is generally believed that the more stringent the filtering threshold, the more reliable the result of a DE analysis. Nevertheless, by analyzing the impact of both adjusted p-value and fold change thresholds on DE analyses, with RNA-seq data obtained for three different cancer types from the Cancer Genome Atlas (TCGA) database, we found that, for a given sample size, the reproducibility of DE results became poorer when more stringent thresholds were applied. No matter which threshold level was applied, the overlap rates of DEGs were generally lower for small sample sizes than for large sample sizes. The raw read count analysis demonstrated that the transcript expression of the same gene in different samples, whether in tumor groups or in normal groups, showed high variations, which resulted in a drastic fluctuation in fold change values and adjustedp-values when different sets of samples were used. Overall, more stringent thresholds did not yield more reliable DEGs due to high variations in transcript expression; the reliability of DEGs obtained with small sample sizes was more susceptible to these variations. Therefore, less stringent thresholds are recommended for screening DEGs. Moreover, large sample sizes should be considered in RNA-seq experimental designs to reduce the interfering effect of variations in transcript expression on DEG identification.

中文翻译:

RNA-seq分析中转录本表达的高变异对鉴别差异表达基因的影响

已经在处理 RNA-seq 数据的算法上做出了巨大努力,以提高差异表达 (DE) 分析的准确性和效率。然而,对于用于过滤差异表达基因 (DEG)的倍数变化和调整后的p值的适当阈值尚未达成共识。一般认为,过滤阈值越严格,DE 分析的结果越可靠。尽管如此,通过分析调整后的pDE 分析的值和倍数变化阈值,从癌症基因组图谱 (TCGA) 数据库中获得三种不同癌症类型的 RNA-seq 数据,我们发现,对于给定的样本量,DE 结果的可重复性变得更差时应用了严格的阈值。无论应用哪个阈值水平,小样本量的 DEG 重叠率通常低于大样本量。原始读数计数分析表明,同一基因在不同样本中的转录表达,无论是肿瘤组还是正常组,均表现出较大的变异,导致倍数变化值和调整后的p值出现剧烈波动。- 使用不同样本集时的值。总体而言,由于转录表达的高度变化,更严格的阈值不会产生更可靠的 DEG;用小样本量获得的 DEG 的可靠性更容易受到这些变化的影响。因此,建议使用不太严格的阈值来筛选 DEG。此外,在 RNA-seq 实验设计中应考虑大样本量,以减少转录本表达变化对 DEG 鉴定的干扰影响。
更新日期:2021-08-03
down
wechat
bug