当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LIQA: Long-read Isoform Quantification and Analysis
bioRxiv - Bioinformatics Pub Date : 2021-04-05 , DOI: 10.1101/2020.09.09.289793
Yu Hu , Li Fang , Xuelian Chen , Jiang F. Zhong , Mingyao Li , Kai Wang

Long-read RNA sequencing (RNA-seq) technologies have made it possible to sequence full-length transcripts, facilitating the exploration of isoform-specific gene expression (isoform relative abundance and isoform-level TPM) over conventional short-read RNA-seq. However, long-read RNA-seq suffers from high per-base error rate, presence of chimeric reads or alternative alignments, and other biases, which require different analysis methods than short-read RNA-seq. Here we present LIQA (Long-read Isoform Quantification and Analysis), an Expectation-Maximization based statistical method to quantify isoform expression and detect differential alternative splicing (DAS) events using long-read RNA-seq data. Rather than summarizing isoform-specific read counts directly as done in short-read methods, LIQA incorporates base-pair quality score and isoform-specific read length information to assign different weights across reads, which reflects alignment confidence. Moreover, LIQA can detect DAS events between conditions using isoform usage estimates. We evaluated LIQAs performance on simulated data and demonstrated that it outperforms other approaches in characterizing isoforms with low read coverage and in detecting DAS events between two groups. We also generated one direct mRNA sequencing dataset and one cDNA sequencing dataset using the Oxford Nanopore long-read platform, both with paired short-read RNA-seq data and qPCR data on selected genes, and we demonstrated that LIQA performs well in isoform discovery and quantification. Finally, we evaluated LIQA on a PacBio dataset on esophageal squamous epithelial cells, and demonstrated that LIQA recovered DAS events that failed to be detected in short-read data. In summary, LIQA leverages the power of long-read RNA-seq and achieves higher accuracy in estimating isoform abundance than existing approaches, especially for isoforms with low coverage and biased read distribution. LIQA is freely available at https://github.com/WGLab/LIQA.

中文翻译:

LIQA:长期异构体定量和分析

长期阅读RNA测序(RNA-seq)技术使对全长转录本进行测序成为可能,从而促进了与传统的短阅读RNA-seq相比的同工型特异性基因表达(同工型相对丰度和同工型水平TPM)的探索。但是,长读RNA-seq的碱基错误率高,嵌合读码或其他比对的存在以及其他偏见,与短读RNA-seq相比,它们需要不同的分析方法。在这里,我们介绍LIQA(长时间读取的异构体定量和分析),这是一种基于期望最大化的统计方法,用于量化异构体的表达并使用长时间读取的RNA-seq数据检测差异性可变剪接(DAS)事件。与其像短读方法那样直接汇总特定于异构体的读计数,LIQA结合了碱基对质量得分和特定于异构体的阅读长度信息,以在阅读中分配不同的权重,从而反映了比对的置信度。此外,LIQA可以使用异构体使用估计来检测条件之间的DAS事件。我们在模拟数据上评估了LIQA的性能,并证明了它在表征低读取覆盖率的同工型以及检测两组之间的DAS事件方面优于其他方法。我们还使用牛津纳米孔长读平台生成了一个直接的mRNA测序数据集和一个cDNA测序数据集,并具有选定基因的配对的短读RNA-seq数据和qPCR数据,并且我们证明LIQA在同工型发现和筛选中表现良好。量化。最后,我们在PacBio数据集上评估了食管鳞状上皮细胞的LIQA,并证明LIQA恢复了未能在短读数据中检测到的DAS事件。总之,与现有方法相比,LIQA充分利用了长读RNA-seq的强大功能,并在估算同工型丰度方面获得了更高的准确性,尤其是对于覆盖率低且读数分布有偏倚的同工型。LIQA可从https://github.com/WGLab/LIQA免费获得。
更新日期:2021-04-05
down
wechat
bug