当前位置: X-MOL 学术RNA Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HSM6AP: a high-precision predictor for the Homo sapiens N6-methyladenosine (m^6 A) based on multiple weights and feature stitching
RNA Biology ( IF 4.1 ) Pub Date : 2021-02-12 , DOI: 10.1080/15476286.2021.1875180
Jing Li 1 , Shida He 1 , Fei Guo 1 , Quan Zou 2
Affiliation  

ABSTRACT

Recent studies have shown that RNA methylation modification can affect RNA transcription, metabolism, splicing and stability. In addition, RNA methylation modification has been associated with cancer, obesity and other diseases. Based on information about human genome and machine learning, this paper discusses the effect of the fusion sequence and gene-level feature extraction on the accuracy of methylation site recognition. The significant limitation of existing computing tools was exposed by discovered of new features. (1) Most prediction models are based solely on sequence features and use SVM or random forest as classification methods. (2) Limited by the number of samples, the model may not achieve good performance. In order to establish a better prediction model for methylation sites, we must set specific weighting strategies for training samples and find more powerful and informative feature matrices to establish a comprehensive model. In this paper, we present HSM6AP, a high-precision predictor for the Homo sapiens N6-methyladenosine (m6A) based on multiple weights and feature stitching. Compared with existing methods, HSM6AP samples were creatively weighted during training, and a wide range of features were explored. Max-Relevance-Max-Distance (MRMD) is employed for feature selection, and the feature matrix is generated by fusing a single feature. The extreme gradient boosting (XGBoost), an integrated machine learning algorithm based on decision tree, is used for model training and improves model performance through parameter adjustment. Two rigorous independent data sets demonstrated the superiority of HSM6AP in identifying methylation sites. HSM6AP is an advanced predictor that can be directly employed by users (especially non-professional users) to predict methylation sites. Users can access our related tools and data sets at the following website: http://lab.malab.cn/~lijing/HSM6AP.html The codes of our tool can be publicly accessible at https://github.com/lijingtju/HSm6AP.git



中文翻译:

HSM6AP:基于多重权重和特征拼接的智人 N6-甲基腺苷 (m^6 A) 高精度预测器

摘要

最近的研究表明,RNA甲基化修饰可以影响RNA的转录、代谢、剪接和稳定性。此外,RNA甲基化修饰与癌症、肥胖和其他疾病有关。本文基于人类基因组和机器学习的信息,讨论了融合序列和基因级特征提取对甲基化位点识别准确性的影响。新功能的发现暴露了现有计算工具的重大局限性。(1) 大多数预测模型仅基于序列特征,并使用 SVM 或随机森林作为分类方法。(2) 受限于样本数量,模型可能无法取得良好的表现。为了建立更好的甲基化位点预测模型,我们必须为训练样本设置特定的加权策略,并找到更强大和信息量更大的特征矩阵来建立一个全面的模型。在本文中,我们提出了 HSM6AP,一个高精度的预测器智人N6-甲基腺苷(6一种) 基于多个权重和特征拼接。与现有方法相比,HSM6AP 样本在训练过程中创造性地加权,并探索了广泛的特征。采用最大相关最大距离(MRMD)进行特征选择,通过融合单个特征生成特征矩阵。极端梯度提升(XGBoost)是一种基于决策树的集成机器学习算法,用于模型训练,通过参数调整提高模型性能。两个严格的独立数据集证明了 HSM6AP 在识别甲基化位点方面的优势。HSM6AP 是一种高级预测器,用户(尤其是非专业用户)可以直接使用它来预测甲基化位点。用户可以在以下网站访问我们的相关工具和数据集:http://lab.malab.cn/~lijing/HSM6AP.html我们工具的代码可以在https://github.com/lijingtju/HSm6AP.git公开获取

更新日期:2021-02-12
down
wechat
bug