当前位置: X-MOL 学术PLOS ONE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting protein complexes using a supervised learning method combined with local structural information
PLOS ONE ( IF 2.9 ) Pub Date : 2018-03-19 , DOI: 10.1371/journal.pone.0194124
Yadong Dong 1 , Yongqi Sun 1 , Chao Qin 1
Affiliation  

The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.



中文翻译:


使用监督学习方法结合局部结构信息预测蛋白质复合物



现有的蛋白质复合物检测方法大致可分为两类:无监督学习方法和监督学习方法。大多数无监督学习方法都假设蛋白质复合物位于蛋白质-蛋白质相互作用(PPI)网络的密集区域,即使许多真正的复合物不是密集子图。监督学习方法利用已知复合体的信息特性;他们经常从现有的复合体中提取特征,然后使用这些特征来训练分类模型。训练后的模型用于指导新复合物的搜索过程。然而,提取的特征不足、PPI数据中的噪声以及复杂数据的不完整使得分类模型不精确。因此,分类模型不足以指导复合物的检测。因此,我们提出了一种新的鲁棒评分函数,将分类模型与局部结构信息相结合。基于评分函数,我们提供了一种向前和向后的搜索方法。六个基准 PPI 数据集和三个蛋白质复合物数据集的实验结果表明,与最先进的监督、半监督和无监督蛋白质复合物检测方法相比,我们的方法可以获得更好的性能,有时会显着优于此类方法。

更新日期:2018-03-20
down
wechat
bug