当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ANOVA-particle swarm optimization-based feature selection and gradient boosting machine classifier for improved protein–protein interaction prediction
Proteins: Structure, Function, and Bioinformatics ( IF 3.2 ) Pub Date : 2021-09-15 , DOI: 10.1002/prot.26236
Satyajit Mahapatra 1 , Sitanshu Sekhar Sahu 1
Affiliation  

Feature fusion and selection strategies have been applied to improve accuracy in the prediction of protein–protein interaction (PPI). In this paper, an embedded feature selection framework is developed by integrating a cost function based on analysis of variance (ANOVA) with the particle swarm optimization (PSO), termed AVPSO. Initially, the features of the protein sequences extracted using pseudo-amino acid composition (PseAAC), conjoint triad composition, and local descriptor are fused. Then, AVPSO is employed to select the optimal set of features. The light gradient boosting machine (LGBM) classifier is used to predict the PPIs using the optimal feature subset. On the five-fold cross-validation analysis, the proposed model (AVPSO-LGBM) achieved an average accuracy of 97.12% and 95.09%, respectively, on the intraspecies PPI datasets Saccharomyces cerevisiae and Helicobacter pylori. On the interspecies, PPI datasets of the Human-Bacillus and Human-Yersinia, an average accuracy of 95.20% and 93.44%, are achieved. Results obtained on independent test datasets, and network datasets show that the prediction accuracy of the AVPSO-LGBM is better than the existing methods, demonstrating its generalization ability. The improved prediction performance obtained by the proposed model makes it a reliable and effective PPI prediction model.

中文翻译:

基于方差分析粒子群优化的特征选择和梯度增强机器分类器用于改进蛋白质-蛋白质相互作用预测

特征融合和选择策略已被应用于提高蛋白质-蛋白质相互作用 (PPI) 预测的准确性。在本文中,通过将基于方差分析 (ANOVA) 的成本函数与称为 AVPSO 的粒子群优化 (PSO) 相结合,开发了一个嵌入式特征选择框架。最初,融合使用假氨基酸组成(PseAAC)、联合三元组组成和局部描述符提取的蛋白质序列的特征。然后,采用 AVPSO 来选择最优的特征集。光梯度增强机 (LGBM) 分类器用于使用最优特征子集预测 PPI。在五折交叉验证分析中,所提出的模型(AVPSO-LGBM)在种内 PPI 数据集上的平均准确率分别为 97.12% 和 95.09%酿酒酵母幽门螺杆菌。在种间上,人类-芽孢杆菌人类-耶尔森氏菌的 PPI 数据集的平均准确率分别为 95.20% 和 93.44%。在独立测试数据集和网络数据集上获得的结果表明,AVPSO-LGBM 的预测精度优于现有方法,展示了其泛化能力。该模型所获得的改进的预测性能使其成为可靠有效的PPI预测模型。
更新日期:2021-09-15
down
wechat
bug