当前位置: X-MOL 学术Comput. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.
Computers in Biology and Medicine ( IF 7.7 ) Pub Date : 2020-07-15 , DOI: 10.1016/j.compbiomed.2020.103899
Cheng Chen 1 , Qingmei Zhang 1 , Bin Yu 2 , Zhaomin Yu 1 , Patrick J Lawrence 3 , Qin Ma 3 , Yan Zhang 4
Affiliation  

Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/.



中文翻译:

使用XGBoost特征选择和堆叠集成分类器提高蛋白质间相互作用的预测准确性。

蛋白质间相互作用(PPI)与蛋白质组学水平的大多数细胞活动有关,这使得对PPI的研究对于理解任何生物学过程都必不可少。已经探索了机器学习方法,从而导致更准确,更通用的PPI预测。在本文中,我们提出了一个称为StackPPI的预测框架。首先,我们使用伪氨基酸组成,Moreau-Broto,Moran和Geary自相关描述符,氨基酸组成特定位置的评分矩阵,Bi-gram特定位置的评分矩阵和组成,过渡和分布来编码生物学相关特征。其次,我们使用XGBoost来减少特征噪声,并通过梯度提升和平均增益来执行降维。最后,由StackPPI分析得出的优化功能,我们从堆叠的集成分类器中开发了一个PPI预测变量,该分类器由随机森林,极随机树和逻辑回归算法组成。五重交叉验证显示StackPPI可以成功预测PPI,其ACC为89.27%,MCC为0.7859,AUC为0.9561幽门螺杆菌,在酿酒酵母上的ACC为94.64%,MCC为0.8934,AUC为0.9810 。我们发现与最新模型相比,StackPPI可以提高独立测试集上蛋白质相互作用的预测准确性。最后,我们重点介绍StackPPI推断具有生物学意义的PPI网络的能力。StackPPI对功能途径的准确预测使其成为研究PPI潜在机制的合理选择,尤其是在应用于药物设计时。用于创建StackPPI的数据集和源代码位于此处:https://github.com/QUST-AIBBDRC/StackPPI/。

更新日期:2020-07-23
down
wechat
bug