当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RF-SVM: Identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine
Proteins: Structure, Function, and Bioinformatics ( IF 3.2 ) Pub Date : 2021-08-29 , DOI: 10.1002/prot.26229
Yanping Zhang 1 , Jianwei Ni 1 , Ya Gao 1
Affiliation  

Protein-DNA interactions play an important role in biological progress, such as DNA replication, repair, and modification processes. In order to have a better understanding of its functions, the one of the most important steps is the identification of DNA-binding proteins. We propose a DNA-binding protein predictor, namely, RF-SVM, which contains four types features, that is, pseudo amino acid composition (PseAAC), amino acid distribution (AAD), adjacent amino acid composition frequency (ACF) and Local-DPP. Random Forest algorithm is utilized for selecting top 174 features, which are established the predictor model with the support vector machine (SVM) on training dataset UniSwiss-Tr. Finally, RF-SVM method is compared with other existing methods on test dataset UniSwiss-Tst. The experimental results demonstrated that RF-SVM has accuracy of 84.25%. Meanwhile, we discover that the physicochemical properties of amino acids for OOBM770101(H), CIDH920104(H), MIYS990104(H), NISK860101(H), VINM940103(H), and SNEP660101(A) have contribution to predict DNA-binding proteins. The main code and datasets can gain in https://github.com/NiJianWei996/RF-SVM.

中文翻译:

RF-SVM:基于综合特征表示方法和支持向量机的 DNA 结合蛋白鉴定

蛋白质-DNA 相互作用在生物学进程中发挥着重要作用,例如 DNA 复制、修复和修饰过程。为了更好地了解其功能,最重要的步骤之一是鉴定 DNA 结合蛋白。我们提出了一种DNA结合蛋白预测器,即RF-SVM,它包含四种类型的特征,即假氨基酸组成(PseAAC)、氨基酸分布(AAD)、相邻氨基酸组成频率(ACF)和Local-民进党。随机森林算法用于选择前 174 个特征,在训练数据集 UniSwiss-Tr 上使用支持向量机 (SVM) 建立预测模型。最后,RF-SVM 方法在测试数据集 UniSwiss-Tst 上与其他现有方法进行了比较。实验结果表明,RF-SVM 的准确率为 84.25%。同时,我们发现OOBM770101(H)、CIDH920104(H)、MIYS990104(H)、NISK860101(H)、VINM940103(H)和SNEP660101(A)氨基酸的理化性质对预测DNA结合蛋白有贡献。 . 主要代码和数据集可以在 https://github.com/NiJianWei996/RF-SVM 中获取。
更新日期:2021-08-29
down
wechat
bug