当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information
Interdisciplinary Sciences: Computational Life Sciences ( IF 4.8 ) Pub Date : 2020-10-17 , DOI: 10.1007/s12539-020-00399-z
Lingsong Yao 1 , Huadong Wang 2 , Yannan Bin 1
Affiliation  

Abstract

Hot spot residues at protein–DNA binding interfaces are hugely important for investigating the underlying mechanism of molecular recognition. Currently, there are a few tools available for identifying the hot spot residues in the protein–DNA complexes. In addition, the three-dimensional protein structures are needed in these tools. However, it is well known that the three-dimensional structures are unavailable for most proteins. Considering the limitation, we proposed a method, named SPDH, for predicting hot spot residues only based on protein sequences. Firstly, we obtained 133 features from physicochemical property, conservation, predicted solvent accessible surface area and structure. Then, we systematically assessed these features based on various feature selection methods to obtain the optimal feature subset and compared the models using four classical machine learning algorithms (support vector machine, random forest, logistic regression, and k-nearest neighbor) on the training dataset. We found that the variability of physicochemical property features between wild and mutative types was important on improving the performance of the prediction model. On the independent test set, our method achieved the performance with AUC of 0.760 and sensitivity of 0.808, and outperformed other methods. The data and source code can be downloaded at https://github.com/xialab-ahu/SPDH.

Graphic abstract



中文翻译:

基于序列信息预测蛋白质-DNA结合界面的热点残留

摘要

蛋白质-DNA 结合界面上的热点残基对于研究分子识别的潜在机制非常重要。目前,有一些工具可用于识别蛋白质-DNA 复合物中的热点残基。此外,这些工具需要三维蛋白质结构。然而,众所周知,大多数蛋白质无法获得三维结构。考虑到局限性,我们提出了一种名为 SPDH 的方法,用于仅基于蛋白质序列预测热点残基。首先,我们从物理化学性质、守恒性、预测的溶剂可及表面积和结构中获得了 133 个特征。然后,k -最近邻)在训练数据集上。我们发现野生型和突变型之间理化性质特征的可变性对于提高预测模型的性能很重要。在独立测试集上,我们的方法达到了 AUC 为 0.760 和灵敏度为 0.808 的性能,并且优于其他方法。数据和源代码可在 https://github.com/xialab-ahu/SPDH 下载。

图形摘要

更新日期:2020-10-17
down
wechat
bug