当前位置: X-MOL 学术bioRxiv. Plant Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
bioRxiv - Plant Biology Pub Date : 2021-09-07 , DOI: 10.1101/2021.09.02.458763
Runxuan Zhang , Richard Kuo , Max Coulter , Cristiane P. G. Calixto , Juan Carlos Entizne , Wenbin Guo , Yamile Marquez , Linda Milne , Stefan Riegler , Akihiro Matsui , Maho Tanaka , Sarah Harvey , Yubang Gao , Theresa Wießner-Kroh , Martin Crespi , Katherine Denby , Asa ben Hur , Enamul Huq , Michael Jantsch , Artur Jarmolowski , Tino Koester , Sascha Laubinger , Qingshun Quinn Li , Lianfeng Gu , Motoaki Seki , Dorothee Staiger , Ramanjulu Sunkar , Zofia Szweykowska-Kulinska , Shih-Long Tu , Andreas Wachter , Robbie Waugh , Liming Xiong , Xiao-Ning Zhang , Anireddy S.N. Reddy , Andrea Barta , Maria Kalyna , John WS Brown

Background: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single molecule long read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation or incomplete cDNA synthesis. Results: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 160k transcripts - twice that of the best current Arabidopsis transcriptome and including over 1,500 novel genes. 79% of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We developed novel methods to determine splice junctions and transcription start and end sites accurately. Mis-match profiles around splice junctions provided a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identified high confidence transcription start/end sites and removed fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provided higher resolution of transcript expression profiling and identified cold- and light-induced differential transcription start and polyadenylation site usage. Conclusions: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently available. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single molecule sequencing analysis from any species.

中文翻译:

基于高分辨率单分子测序的拟南芥转录组使用新的 Iso-seq 分析方法

背景:转录本序列的准确而全面的注释对于转录本定量以及差异基因和转录本表达分析至关重要。单分子长读长测序技术提高了转录本结构的完整性,包括选择性剪接、转录起始位点和聚腺苷酸化位点。然而,测序错误、mRNA 降解或 cDNA 合成不完整会显着影响准确性。结果:我们提出了一个新的、全面的拟南芥参考转录数据集 3 (AtRTD3)。AtRTD3 包含超过 160k 的转录本——是目前最好的拟南芥转录组的两倍,包括超过 1,500 个新基因。79% 的转录本来自 Iso-seq,具有准确定义的剪接点和转录起始和终止位点。我们开发了新方法来准确确定剪接点和转录起始和结束位点。剪接点周围的错配剖面提供了一个强大的功能来区分正确的剪接点并去除错误的剪接点。分层方法确定了高可信度的转录起始/结束位点,并去除了由于降解导致的片段转录。对拟南芥冷反应 RNA-seq 时间序列的分析表明,AtRTD3 是对现有转录组的重大改进。AtRTD3 提供了更高的转录表达谱分析分辨率,并确定了冷和光诱导的差异转录起始和聚腺苷酸化位点的使用。结论:AtRTD3 是目前​​可用的最全面的拟南芥转录组。它提高了 RNA-seq 数据中差异基因和转录本表达、差异可变剪接和转录起始/结束位点使用的精确度。用于识别准确剪接点和转录起始/结束位点的新方法具有广泛的适用性,并将改进任何物种的单分子测序分析。
更新日期:2021-09-08
down
wechat
bug