当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples.
GigaScience ( IF 11.8 ) Pub Date : 2019-12-01 , DOI: 10.1093/gigascience/giz145
Hong Zheng 1 , Kevin Brennan 1 , Mikel Hernaez 2 , Olivier Gevaert 1, 3
Affiliation  

BACKGROUND Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification. RESULTS In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods. CONCLUSIONS Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.

中文翻译:

用于癌症样本 RNA 测序的长非编码 RNA 定量基准。

背景长链非编码RNA(lncRNA)正在成为各种生物过程的重要调节因子。虽然许多研究利用癌症基因组图谱中的 RNA 测序 (RNA-Seq) 数据等公共资源来研究癌症中的 lncRNA,但选择准确表达定量的最佳方法至关重要。结果在本研究中,我们比较了伪比对方法 Kallisto 和 Salmon、基于比对的转录本定量方法 RSEM 以及基于比对的基因定量方法 HTSeq 和 featureCounts,并结合读取比对器 STAR、Subread 和 HISAT2 在 lncRNA 定量中的性能,通过将它们应用于非链和链 RNA-Seq 数据集。完整的转录组注释,包括蛋白质编码和非编码RNA,大大提高了lncRNA表达定量的特异性。无论 RNA-Seq 协议类型、对齐器的选择和转录组注释如何,在样本和基因水平比较中,伪比对方法和 RSEM 的 lncRNA 定量性能均优于 HTSeq 和 featureCounts。伪对齐方法和 RSEM 检测到更多的 lncRNA,并与模拟的地面真实情况高度相关。相反,HTSeq 和 featureCounts 经常低估 lncRNA 表达。基于比对的基因定量方法对反义 lncRNA 的定量效果较差,可以使用搁浅方案和伪比对方法进行改进。结论 考虑到与真实情况和计算资源的一致性,伪对齐方法 Kallisto 或 Salmon 与全转录组注释相结合是我们推荐的 lncRNA RNA-Seq 分析策略。
更新日期:2019-12-06
down
wechat
bug