Elsevier

Genomics

Volume 112, Issue 6, November 2020, Pages 4342-4347
Genomics

Prediction of N7-methylguanosine sites in human RNA based on optimal sequence features

https://doi.org/10.1016/j.ygeno.2020.07.035Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Novel analytical method is developed to analysis and identify N-7 methylguanosine (m7G) modification site.

  • Feature selection techniques were proposed and used to optimize RNA sequence features.

  • A softpackage was constructed to accurately recognize m7G site in human RNA sequence.

Abstract

N-7 methylguanosine (m7G) modification is a ubiquitous post-transcriptional RNA modification which is vital for maintaining RNA function and protein translation. Developing computational tools will help us to easily predict the m7G sites in RNA sequence. In this work, we designed a sequence-based method to identify the modification site in human RNA sequences. At first, several kinds of sequence features were extracted to code m7G and non-m7G samples. Subsequently, we used mRMR, F-score, and Relief to obtain the optimal subset of features which could produce the maximum prediction accuracy. In 10-fold cross-validation, results showed that the highest accuracy is 94.67% achieved by support vector machine (SVM) for identifying m7G sites in human genome. In addition, we examined the performances of other algorithms and found that the SVM-based model outperformed others. The results indicated that the predictor could be a useful tool for studying m7G. A prediction model is available at https://github.com/MapFM/m7g_model.git.

Keywords

N-7 methylguanosine
Feature extraction
Feature selection
Feature analysis
Softpackage

Cited by (0)