当前位置: X-MOL 学术Artif. Intell. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques
Artificial Intelligence in Medicine ( IF 6.1 ) Pub Date : 2021-02-13 , DOI: 10.1016/j.artmed.2021.102034
Avdesh Mishra 1 , Reecha Khanal 2 , Wasi Ul Kabir 2 , Tamjidul Hoque 2
Affiliation  

Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to annotate RBPs and assist the experimental design efficiently. In this work, we present a method called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP, use the majority vote from RBPPred, DeepRBPPred, and the stacking model for the prediction for RBPs.

The results show that AIRBP attains Accuracy (ACC), Balanced Accuracy (BACC), F1-score, and Mathews Correlation Coefficient (MCC) of 95.84 %, 94.71 %, 0.928, and 0.899, respectively, based on the training dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, BACC, F1-score, and MCC of 94.36 %, 94.28 %, 0.897, and 0.860, for Human test set; 91.25 %, 93.00 %, 0.896, and 0.835 for S. cerevisiae test set; and 90.60 %, 90.41 %, 0.934, and 0.775 for A. thaliana test set, respectively. These results indicate that the AIRBP outperforms the existing Deep- and TriPepSVM methods. Therefore, the proposed better-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases.

Availability: Code-data is available here: http://cs.uno.edu/∼tamjid/Software/AIRBP/code_data.zip



中文翻译:

AIRBP:使用机器学习技术准确识别 RNA 结合蛋白

鉴定与核糖核酸分子结合的 RNA 结合蛋白 (RBP) 是计算生物学和生物信息学中的一个重要问题。识别 RBP 变得必不可少,因为它们在 RNA 的转录后控制和 RNA 代谢中起着至关重要的作用,并且在各种生物过程中发挥着不同的作用,例如剪接、mRNA 稳定、mRNA 定位和翻译、RNA 合成、折叠展开、修改、加工和降解。用于识别 RBP 的现有实验技术既耗时又昂贵。因此,使用计算方法直接从序列中识别 RBP 可用于注释 RBP 并有效地协助实验设计。在这项工作中,我们提出了一种称为 AIRBP 的方法,它是使用称为堆叠的先进机器学习技术设计的,通过利用从进化信息、物理化学特性和无序特性中提取的特征来有效地预测 RBP。此外,我们的方法 AIRBP 使用来自 RBPPred、DeepRBPPred 和堆叠模型的多数投票来预测 RBP。

结果表明,AIRBP 在训练数据集的基础上,使用 10 -折叠交叉验证(CV)。在独立测试集上对 AIRBP 的进一步评估表明,对于人类测试集,它实现了 ACC、BACC、F1 分数和 MCC 分别为 94.36%、94.28%、0.897 和 0.860;酿酒酵母测试集为 91.25 %、93.00 %、0.896 和 0.835 ;拟南芥为 90.60 %、90.41 %、0.934 和 0.775测试集,分别。这些结果表明 AIRBP 优于现有的 Deep- 和 TriPepSVM 方法。因此,所提出的性能更好的 AIRBP 可用于直接从序列中准确识别和注释 RBP,并有助于获得治疗危重疾病的宝贵见解。

可用性:代码数据可在此处获得:http://cs.uno.edu/∼tamjid/Software/AIRBP/code_data.zip

更新日期:2021-02-22
down
wechat
bug