当前位置: X-MOL 学术Evol. Bioinf. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods.
Evolutionary Bioinformatics ( IF 1.7 ) Pub Date : 2020-07-20 , DOI: 10.1177/1176934320915707
Hao Xue 1, 2 , Zhen Wei 3, 4 , Kunqi Chen 3, 4 , Yujiao Tang 3, 4 , Xiangyu Wu 3, 4 , Jionglong Su 1 , Jia Meng 3, 5
Affiliation  

RNA N6-methyladenosine (m6A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m6A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m6A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m6A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m6A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.



中文翻译:

使用分类和回归方法从基因表达数据预测RNA甲基化状态。

RNA N 6-甲基腺苷(m 6 A)因其在调节RNA的稳定性,结构,加工和翻译中的作用而成为重要的表观遗传修饰。m 6 A动态平衡的不稳定性可能会导致干细胞调节方面的缺陷,生育能力下降以及患癌症的风险。迄今为止,RNA m 6的实验检测和定量修改仍然是费时且费力的。现有数据库中仅有少量的转录组样本,并且对于感兴趣的生物学问题而言,匹配的RNA甲基化谱图并不经常可用。由于基因表达数据通常可用于大多数生物学问题,因此如果我们可以使用计算机模拟技术从基因表达数据中估算RNA甲基化状态,可能会很有吸引力方法。在这项研究中,我们探索了使用分类和回归方法,基于从73种实验条件中收集的小鼠RNA甲基化数据,从基因表达数据中计算预测RNA甲基化状态的可能性。构造了用于分类的弹性网正则Logistic回归(ENLR),支持向量机(SVM)和随机森林(RF)。SVM和RF均达到最佳性能,整个样本的曲线下平均面积(AUC)= 0.84;SVM的AUC传播范围更窄。对ENLR选择作为预测因子的那些位点进行基因位点富集分析,以获取模型的生物学意义。发现三个功能注释术语具有统计学意义:磷蛋白,SRC同源3(SH3)域和内质网。6一条途径。为了进行回归分析,使用了Elastic Net,得出的平均皮尔逊相关系数= 0.68,平均斯皮尔曼相关系数= 0.64。我们的探索性研究表明,基因表达数据可用于以足够的准确性构建m 6 A甲基化状态的预测因子。我们的工作首次表明,可以从匹配的基因表达数据中预测RNA甲基化状态。当没有匹配的RNA甲基化图谱时,尤其是在研究的早期阶段,这一发现可能有助于在各种生物学背景下进行RNA修饰研究。

更新日期:2020-07-20
down
wechat
bug