当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GTB-PPI: Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
Genomics, Proteomics & Bioinformatics ( IF 9.5 ) Pub Date : 2021-01-27 , DOI: 10.1016/j.gpb.2021.01.001
Bin Yu 1 , Cheng Chen 2 , Hongyan Zhou 2 , Bingqiang Liu 3 , Qin Ma 4
Affiliation  

Protein–protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.



中文翻译:

GTB-PPI:基于 L1 正则化逻辑回归和梯度树提升预测蛋白质-蛋白质相互作用

蛋白质-蛋白质相互作用(PPI)对于了解遗传机制、描绘疾病发病机制和指导药物设计非常重要。随着 PPI 数据的增加和机器学习技术的发展,PPI 的预测和识别已成为蛋白质组学的研究热点。在这项研究中,我们提出了一种基于梯度树提升(GTB)的PPI 新预测管道。首先,通过融合伪氨基酸组成 (PseAAC)、伪位置特定评分矩阵 (PsePSSM)、缩减序列和索引向量 (RSIV) 以及自相关描述符 (AD) 来提取初始特征向量。其次,为了去除冗余和噪音,我们采用L1 正则化逻辑回归(L1-RLR) 选择最优特征子集。最后构建GTB-PPI模型。五折交叉验证表明,GTB-PPI 在酿酒酵母幽门螺杆菌数据集上的准确率分别达到了 95.15% 和 90.47% 。此外,GTB-PPI 可用于预测秀丽隐杆线虫大肠杆菌人和小家鼠的独立测试数据集,CD9 的单核 PPI 网络,以及 Wnt 相关信号通路的交叉 PPI 网络。结果表明,GTB-PPI可以显着提高PPI预测的准确性。GTB-PPI 的代码和数据集可以从 https://github.com/QUST-AIBBDRC/GTB-PPI/ 下载。

更新日期:2021-01-27
down
wechat
bug