当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inter-protein residue covariation information unravels physically interacting protein dimers
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-12-17 , DOI: 10.1186/s12859-020-03930-7
Sara Salmanian , Hamid Pezeshk , Mehdi Sadeghi

Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.

中文翻译:

蛋白质间残基共变异信息揭示了物理相互作用的蛋白质二聚体

预测蛋白质之间的物理相互作用是计算生物学中的最大挑战之一。存在大量的各种蛋白质相互作用,以及大量的蛋白质序列和合成肽,其中相互作用的配对物未知。大多数共同进化方法都发现了物理相互作用和功能关联的组合。但是,只有少数几种方法可以专门推断出物理相互作用。混合协同进化方法利用蛋白质间残基协同进化解开特定的物理相互作用蛋白。在这项研究中,我们介绍了一种基于混合协同进化的方法,可预测仅从蛋白质序列开始的一对蛋白质家族之间的物理相互作用。在目前的分析中,为每个二聚体构建一对多序列比对,并通过CCMpred(预测的相关突变接触)和三种基于互信息的方法计算十个可及的表面积阈值组,以计算这些对中残基之间的协方差。然后,每个二聚体蛋白之间的整个残基偶联被统一为一个Frobenius范数值。将具有不同可及表面积阈值的所有二聚体的残基接触矩阵的规范作为单个或多个特征模型输入到支持向量机中。通过单个特征训练分类器的结果表明,对于不同的可访问表面积阈值,在不同方法中没有明显的不同精度。不过,互信息产品和相关性过程的上下文可能性分别可能比其他两种方法具有更高的性能和更低的总体性能,分别适用于不同的可访问表面积截止值。结果还表明,针对多个可访问的表面积阈值具有多种范式特征的训练支持向量机可显着提高预测性能。在这种情况下,与基于互信息的方法相比,CCMpred总体上可以获得更好的性能。该方法的最佳准确度,灵敏度,特异性,精确度和阴性预测值分别为0.98、1、0.962、0.96和0.962。在本文中,通过将蛋白质二聚体的标准值以不同的可访问表面积阈值输入支持向量机,
更新日期:2020-12-17
down
wechat
bug