当前位置: X-MOL 学术Nat. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.
Nature Genetics ( IF 31.7 ) Pub Date : 2017-Dec-01 , DOI: 10.1038/ng.3988
Julien Lagarde 1, 2 , Barbara Uszczynska-Ratajczak 1, 2 , Silvia Carbonell 3 , Sílvia Pérez-Lluch 1, 2 , Amaya Abad 1, 2 , Carrie Davis 4 , Thomas R Gingeras 4 , Adam Frankish 5 , Jennifer Harrow 5 , Roderic Guigo 1, 2 , Rory Johnson 1, 2
Affiliation  

Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.

中文翻译:


通过捕获长读长测序对全长长非编码 RNA 进行高通量注释。



基因及其转录本的准确注释是基因组学的基础,但目前还没有一种注释技术能够将吞吐量和准确性结合起来。因此,参考基因集合仍然不完整,许多基因模型都是支离破碎的,还有数千个基因模型仍未编目,尤其是长非编码 RNA (lncRNA)。为了加速 lncRNA 注释,GENCODE 联盟开发了 RNA Capture Long Seq (CLS),它将靶向 RNA 捕获与第三代长读长测序相结合。在这里,我们展示了匹配的人和小鼠组织中 GENCODE 基因间 lncRNA 群体的实验重新注释,分别产生了 3,574 个和 561 个基因位点的新转录模型。 CLS 大约使目标位点的注释复杂性增加了一倍,优于现有的短读技术。 CLS 生成的全长转录本模型使我们能够明确表征 lncRNA 的基因组特征,包括启动子和基因结构以及蛋白质编码潜力。因此,CLS 消除了转录组注释中长期存在的瓶颈,并以高通量规模生成手动质量的全长转录模型。
更新日期:2017-11-10
down
wechat
bug