当前位置: X-MOL 学术Neural Comput. & Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting
Neural Computing and Applications ( IF 4.5 ) Pub Date : 2020-03-13 , DOI: 10.1007/s00521-020-04792-z
Minghui Wang , Xiaowen Cui , Bin Yu , Cheng Chen , Qin Ma , Hongyan Zhou

Abstract

Protein cysteine S-sulfenylation is an essential and reversible post-translational modification that plays a crucial role in transcriptional regulation, stress response, cell signaling and protein function. Studies have shown that S-sulfenylation is involved in many human diseases such as cancer, diabetes and arteriosclerosis. However, experimental identification of protein S-sulfenylation sites is generally expensive and time-consuming. In this study, we proposed a new protein S-sulfenylation sites prediction method SulSite-GTB. First, fusion of amino acid composition, dipeptide composition, encoding based on grouped weight, K nearest neighbors, position-specific amino acid propensity, position-weighted amino acid composition and pseudo-position specific score matrix feature extraction to obtain the initial feature space. Secondly, we use the synthetic minority oversampling technique (SMOTE) algorithm to process the class imbalance data, and the least absolute shrinkage and selection operator (LASSO) are employed to remove the redundant and irrelevant features. Finally, the optimal feature subset is input into the gradient tree boosting classifier to predict the S-sulfenylation sites, and the five-fold cross-validation and independent test set method are used to evaluate the prediction performance of the model. Experimental results showed the overall prediction accuracy is 92.86% and 88.53%, respectively, and the AUC values are 0.9706 and 0.9425, respectively, on the training set and the independent test set. Compared with other prediction methods, the results show that the proposed method SulSite-GTB is significantly superior to other state-of-the-art methods and provides a new idea for the prediction of post-translational modification sites of other proteins. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/SulSite-GTB/.



中文翻译:

SulSite-GTB:通过融合多个特征信息和梯度树增强来识别蛋白质S-亚磺酰化位点

摘要

蛋白半胱氨酸S-亚磺酰化是必需的和可逆的翻译后修饰,在转录调节,应激反应,细胞信号转导和蛋白功能中起关键作用。研究表明,S-亚磺酰化与许多人类疾病有关,例如癌症,糖尿病和动脉硬化。然而,蛋白S-亚磺酰基化位点的实验鉴定通常是昂贵且费时的。在这项研究中,我们提出了一种新的蛋白质S-亚磺酰基位点预测方法SulSite-GTB。首先,融合氨基酸组成,二肽组成,基于分组重量的编码K最近邻,位置特定氨基酸倾向,位置加权氨基酸组成和伪位置特定得分矩阵特征提取以获得初始特征空间。其次,我们使用合成少数过采样技术(SMOTE)算法来处理类不平衡数据,并使用最小绝对收缩和选择算子(LASSO)去除冗余和不相关的特征。最后,将最优特征子集输入到梯度树增强分类器中,以预测S-亚磺酰基化位点,并使用五重交叉验证和独立测试集方法评估模型的预测性能。实验结果表明,整体预测准确度分别为92.86%和88.53%,AUC值分别为0.9706和0.9425,在训练集和独立测试集上。与其他预测方法相比,结果表明,所提出的方法SulSite-GTB明显优于其他最新技术,为预测其他蛋白质的翻译后修饰位点提供了新思路。源代码和所有数据集可从https://github.com/QUST-AIBBDRC/SulSite-GTB/获得。

更新日期:2020-03-16
down
wechat
bug