当前位置: X-MOL 学术Int. J. Pattern Recognit. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PseKNC and Adaboost-Based Method for DNA-Binding Proteins Recognition
International Journal of Pattern Recognition and Artificial Intelligence ( IF 1.5 ) Pub Date : 2021-02-20 , DOI: 10.1142/s0218001421500221
Lina Yang 1 , Xiangyu Li 1 , Ting Shu 2 , Patrick Wang 3 , Xichun Li 4
Affiliation  

DNA-binding proteins are an essential part of the DNA. It also an integral component during life processes of various organisms, for instance, DNA recombination, replication, and so on. Recognition of such proteins helps medical researchers pinpoint the cause of disease. Traditional techniques of identifying DNA-binding proteins are expensive and time-consuming. Machine learning methods can identify these proteins quickly and efficiently. However, the accuracies of the existing related methods were not high enough. In this paper, we propose a framework to identify DNA-binding proteins. The proposed framework first uses PseKNC (ps), MomoKGap (mo), and MomoDiKGap (md) methods to combine three algorithms to extract features. Further, we apply Adaboost weight ranking to select optimal feature subsets from the above three types of features. Based on the selected features, three algorithms (k-nearest neighbor (knn), Support Vector Machine (SVM), and Random Forest (RF)) are applied to classify it. Finally, three predictors for identifying DNA-binding proteins are established, including ps+mm, ps+md, ps+mm+md. We utilize benchmark and independent datasets to train and evaluate the proposed framework. Three tests are performed, including Jackknife test, 10-fold cross-validation and independent test. Among them, the accuracy of ps+md is the highest. We named the model with the best result as psmdDBPs and applied it to identify DNA-binding proteins.

中文翻译:

基于 PseKNC 和 Adaboost 的 DNA 结合蛋白识别方法

DNA结合蛋白是DNA的重要组成部分。它也是各种生物体生命过程中不可或缺的组成部分,例如DNA重组、复制等。识别这些蛋白质有助于医学研究人员查明疾病的原因。鉴定 DNA 结合蛋白的传统技术既昂贵又耗时。机器学习方法可以快速有效地识别这些蛋白质。但是,现有的相关方法的准确率不够高。在本文中,我们提出了一个框架来识别 DNA 结合蛋白。提出的框架首先使用 PseKNC (ps)、MomoKGap (mo) 和 MomoDiKGap (md) 方法结合三种算法提取特征。此外,我们应用 Adaboost 权重排序从上述三种类型的特征中选择最优特征子集。基于选择的特征,应用三种算法(k-最近邻(knn)、支持向量机(SVM)和随机森林(RF))对其进行分类。最后,建立了三个用于识别 DNA 结合蛋白的预测因子,包括ps+毫米,ps+MD,ps+毫米+MD. 我们利用基准和独立数据集来训练和评估提出的框架。进行了三个测试,包括 Jackknife 测试、10 折交叉验证和独立测试。其中ps+md的准确率最高。我们将结果最好的模型命名为 psmdDBPs,并将其应用于识别 DNA 结合蛋白。
更新日期:2021-02-20
down
wechat
bug