Frontiers in Physics ( IF 3.1 ) Pub Date : 2020-05-01 , DOI: 10.3389/fphy.2020.00196 Sirajul Salekin , Milad Mostavi , Yu-Chiao Chiu , Yidong Chen , Jianqiu Zhang , Yufei Huang
Epitranscriptome is an exciting area that studies different types of modifications in transcripts, and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN)-based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low-dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for nine epitranscriptome modifications, namely, m6A, m1A, m1G, m2G, m5C, m5U, 2′-
中文翻译:
基于生成对抗网络的无监督表示学习预测转录组修饰位点
转录组是一个令人兴奋的领域,它研究转录本中不同类型的修饰,因此从转录本序列预测此类修饰位点非常重要。然而,大多数修饰的阳性位点的缺乏给训练鲁棒算法带来了严峻的挑战。为了解决这个问题,我们提出了基于生成对抗网络(GAN)的MR-GAN模型,该模型以无监督的方式在整个pre-mRNA序列上进行训练,以学习转录组序列的低维嵌入。然后将MR-GAN应用于在我们为九种转录组修饰创建的训练数据集中提取序列的嵌入,即m 6 A,m 1 A,m 1 G,m 2 G,m5 C,m 5 U,2′-