当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep learning of the back-splicing code for circular RNA formation.
Bioinformatics ( IF 5.8 ) Pub Date : 2019-12-15 , DOI: 10.1093/bioinformatics/btz382
Jun Wang 1 , Liangjiang Wang 1
Affiliation  

MOTIVATION Circular RNAs (circRNAs) are a new class of endogenous RNAs in animals and plants. During pre-RNA splicing, the 5' and 3' termini of exon(s) can be covalently ligated to form circRNAs through back-splicing (head-to-tail splicing). CircRNAs can be conserved across species, show tissue- and developmental stage-specific expression patterns, and may be associated with human disease. However, the mechanism of circRNA formation is still unclear although some sequence features have been shown to affect back-splicing. RESULTS In this study, by applying the state-of-art machine learning techniques, we have developed the first deep learning model, DeepCirCode, to predict back-splicing for human circRNA formation. DeepCirCode utilizes a convolutional neural network (CNN) with nucleotide sequence as the input, and shows superior performance over conventional machine learning algorithms such as support vector machine and random forest. Relevant features learnt by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly. The findings provide new insight into the back-splicing code for circRNA formation. AVAILABILITY AND IMPLEMENTATION All the datasets and source code for model construction are available at https://github.com/BioDataLearning/DeepCirCode. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

中文翻译:

对环状RNA形成的反向剪接编码的深入学习。

动机环状RNA(circRNA)是动植物中的一类新的内源RNA。在RNA前剪接过程中,外显子的5'和3'末端可以通过反向剪接(头对尾剪接)共价连接以形成circRNA。CircRNA可以跨物种保存,显示出组织和发育阶段的特异性表达模式,并且可能与人类疾病有关。然而,尽管已显示出某些序列特征会影响反向剪接,但circRNA形成的机制仍不清楚。结果在这项研究中,通过应用最新的机器学习技术,我们开发了第一个深度学习模型DeepCirCode来预测人类circRNA形成的反向剪接。DeepCirCode利用以核苷酸序列为输入的卷积神经网络(CNN),并显示出优于传统机器学习算法(如支持向量机和随机森林)的性能。DeepCirCode学习到的相关特征表示为序列基序,其中一些与涉及RNA剪接,转录或翻译的人类已知基序匹配。对这些基序的分析表明,它们在RNA序列中的分布对于反向剪接可能很重要。此外,某些人类主题在老鼠和果蝇中似乎是保守的。这些发现为circRNA形成的反向剪接编码提供了新的见识。可用性和实现有关模型构建的所有数据集和源代码,请访问https://github.com/BioDataLearning/DeepCirCode。补充信息补充数据可从Bioinformatics在线获得。
更新日期:2020-01-13
down
wechat
bug