当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Freddie: Annotation-independent Detection and Discovery of Transcriptomic Alternative Splicing Isoforms
bioRxiv - Bioinformatics Pub Date : 2021-01-21 , DOI: 10.1101/2021.01.20.427493
Baraa Orabi , Brian McConeghy , Cedric Chauve , Faraz Hach

Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are not pre-annotated. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, that rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample and computes a set of isoforms for the given sample. Freddie takes as input the genomic alignment of the transcriptomic LRs generated by a splice aligner. It then partitions the reads to sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a Dynamic Programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem -- the Minimum Error Clustering into Isoforms (MErCi) problem -- and is solved using Integer Linear Programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its recall, including those given the complete ground truth annotation. In terms of false positive rate, Freddie performs comparably to the other tools. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line. Freddie detects a potentially novel Androgen Receptor isoform that includes novel intron retention. We cross-validate this novel intron retention using orthogonal publicly available short-read RNA-seq datasets. Availability: Freddie is open source and available at https://bitbucket.org/baraaorabi/freddie

中文翻译:

弗雷迪:注释独立检测和转录组替代剪接异构体的发现。

选择性剪接(AS)是许多癌症发展的重要机制,因为新颖或异常的AS模式作为独立的致癌驱动因素起着重要作用。另外,癌症特异性AS可能是个性化癌症疗法的有效靶标。但是,检测AS事件仍然是一项艰巨的任务,尤其是在未预先注释这些AS事件的情况下。现有的转录组注释数据库远非全面,特别是针对癌症特异性AS而言,这一事实使情况更加恶化。此外,传统的测序技术受到所产生读段长度短的严重限制,这种读段的跨度很少超过单个剪接连接位点。面对这些挑战,转录组长读(LR)测序为AS的检测和发现提供了有希望的潜力。我们介绍房地美,一种与计算注释无关的异构体发现和检测工具。弗雷迪(Freddie)将样品的转录组LR测序作为输入,并计算给定样品的一组同工型。弗雷迪(Freddie)将剪接比对仪产生的转录组LR的基因组比对作为输入。然后,它将读取划分为多个集,这些集可以独立并并行处理。对于每个分区,Freddie会将读段的基因组比对划分为规范的外显子区段。该分割的目的是能够将任何潜在的同工型表示为这些规范外显子的子集。该分段被公式化为优化问题,并通过动态规划算法解决。然后,弗雷迪(Freddie)使用规范化的分割作为简洁的表达,通过联合对读段进行聚类和错误校正来重建同工型。聚类和纠错步骤被公式化为一个优化问题-最小错误聚类为同形异构(MErCi)问题-并使用整数线性规划(ILP)解决。我们将房地美在模拟数据集上的性能与其他对注释数据库有不同依赖性的同工型检测工具进行比较。我们显示房地美在召回中的表现优于其他工具,包括那些具有完整的地面事实注释的工具。在误报率方面,房地美的表现与其他工具相当。我们还在从前列腺癌细胞系内部生成的转录组LR数据集上运行Freddie。房地美检测到潜在的新型雄激素受体同工型,其中包括新型内含子保留。我们使用正交的公共可用的短读RNA-seq数据集对这种新型内含子保留进行交叉验证。可用性:Freddie是开源的,可通过https://bitbucket.org/baraaorabi/freddie获得。
更新日期:2021-01-22
down
wechat
bug