当前位置: X-MOL 学术Chemometr. Intell. Lab. Systems › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RBPro-RF: Use Chou's 5-steps rule to predicting RNA-binding proteins via random forest with elastic net
Chemometrics and Intelligent Laboratory Systems ( IF 3.7 ) Pub Date : 2020-02-01 , DOI: 10.1016/j.chemolab.2019.103919
Xiaomeng Sun , Tingyu Jin , Cheng Chen , Xiaowen Cui , Qin Ma , Bin Yu

Abstract RNA-proteins interaction is essential for the regulation of gene expression, cell defense and developmental regulation and other life activities, so applying machine learning to predict RNA-binding proteins (RBPs) has become a research hotspot in bioinformatics. We propose a new method to predict RNA-binding proteins called RBPro-RF. First, the feature vectors of the protein sequence are extracted by fusing composition-transition-distribution (C-T-D), pseudo-amino acid composition (PseAAC) and position-specific scoring matrix-400 (PSSM-400). Secondly, the synthetic minority oversampling technique (SMOTE) and the edited nearest neighbor (ENN) are employed to balance samples. Then, elastic net (EN) is used to eliminate redundant features and retain the important features to represent RBPs. Finally, the optimal feature vectors are input into random forest classifier to predict RBPs. Ten-fold cross-validation indicates the ACC and MCC of the training set are 97.43% and 0.933, respectively. In addition, the accuracies of three independent test sets Human, S. cerevisiae and A. thaliana are 95.63%, 88.82%, and 92.35%, respectively, which are superior to the state-of-the-art prediction methods. In summary, experimental results show that our method can significantly improve the accuracy of RNA-binding proteins prediction. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/RBPro-RF/.

中文翻译:

RBPro-RF:使用 Chou 的 5 步规则通过带有弹性网络的随机森林预测 RNA 结合蛋白

摘要 RNA-蛋白质相互作用对于基因表达调控、细胞防御和发育调控等生命活动至关重要,因此应用机器学习预测RNA结合蛋白(RBPs)已成为生物信息学的研究热点。我们提出了一种预测 RNA 结合蛋白的新方法,称为 RBPro-RF。首先,通过融合成分-转移-分布(CTD)、伪氨基酸成分(PseAAC)和位置特异性评分矩阵-400(PSSM-400)提取蛋白质序列的特征向量。其次,采用合成少数过采样技术(SMOTE)和编辑最近邻(ENN)来平衡样本。然后,使用弹性网(EN)来消除冗余特征并保留重要特征来表示 RBP。最后,将最优特征向量输入随机森林分类器以预测 RBP。十倍交叉验证表明训练集的 ACC 和 MCC 分别为 97.43% 和 0.933。此外,Human、S. cerevisiae 和 A. thaliana 三个独立测试集的准确率分别为 95.63%、88.82% 和 92.35%,优于最先进的预测方法。总之,实验结果表明,我们的方法可以显着提高 RNA 结合蛋白预测的准确性。源代码和所有数据集可在 https://github.com/QUST-AIBBDRC/RBPro-RF/ 获得。thaliana 分别为 95.63%、88.82% 和 92.35%,优于最先进的预测方法。总之,实验结果表明,我们的方法可以显着提高 RNA 结合蛋白预测的准确性。源代码和所有数据集可在 https://github.com/QUST-AIBBDRC/RBPro-RF/ 获得。thaliana 分别为 95.63%、88.82% 和 92.35%,优于最先进的预测方法。总之,实验结果表明,我们的方法可以显着提高 RNA 结合蛋白预测的准确性。源代码和所有数据集可在 https://github.com/QUST-AIBBDRC/RBPro-RF/ 获得。
更新日期:2020-02-01
down
wechat
bug