当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MutTMPredictor: robust and accurate cascade XGBoost classifier for prediction of mutations in transmembrane proteins
Computational and Structural Biotechnology Journal ( IF 6 ) Pub Date : 2021-11-19 , DOI: 10.1016/j.csbj.2021.11.024
Fang Ge 1 , Yi-Heng Zhu 1 , Jian Xu 1 , Arif Muhammad 1, 2 , Jiangning Song 3, 4 , Dong-Jun Yu 1
Affiliation  

Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, such as neurodegenerative diseases and cystic fibrosis. However, there are limited studies on mutations in transmembrane proteins. In this work, we first design a new feature encoding method, termed weight attenuation position-specific scoring matrix (WAPSSM), which builds upon the protein evolutionary information. Then, we propose a new mutation prediction algorithm (cascade XGBoost) by leveraging the idea learned from consensus predictors and gcForest. Multi-level experiments illustrate the effectiveness of WAPSSM and cascade XGBoost algorithms. Finally, based on WAPSSM and other three types of features, in combination with the cascade XGBoost algorithm, we develop a new transmembrane protein mutation predictor, named MutTMPredictor. We benchmark the performance of MutTMPredictor against several existing predictors on seven datasets. On the 546 mutations dataset, MutTMPredictor achieves the accuracy (ACC) of 0.9661 and the Matthew’s Correlation Coefficient (MCC) of 0.8950. While on the 67,584 dataset, MutTMPredictor achieves an MCC of 0.7523 and area under curve (AUC) of 0.8746, which are 0.1625 and 0.0801 respectively higher than those of the existing best predictor (fathmm). Besides, MutTMPredictor also outperforms two specific predictors on the Pred-MutHTP datasets. The results suggest that MutTMPredictor can be used as an effective method for predicting and prioritizing missense mutations in transmembrane proteins. The MutTMPredictor webserver and datasets are freely accessible at http://csbio.njust.edu.cn/bioinf/muttmpredictor/ for academic use.



中文翻译:

MutTMPredictor:强大而准确的级联 XGBoost 分类器,用于预测跨膜蛋白的突变

跨膜蛋白具有重要的生物学功能,并在多种细胞过程中发挥作用,包括细胞信号传导、分子和离子跨膜转运。大约 60% 的跨膜蛋白被认为是药物靶点。此类蛋白质中的错义突变可导致多种疾病和紊乱,例如神经退行性疾病和囊性纤维化。然而,关于跨膜蛋白突变的研究有限。在这项工作中,我们首先设计了一种新的特征编码方法,称为权重衰减位置特定评分矩阵(WAPSSM),它建立在蛋白质进化信息的基础上。然后,我们利用从共识预测器和 gcForest 中学到的思想,提出了一种新的突变预测算法(级联 XGBoost)。多级实验说明了 WAPSSM 和级联 XGBoost 算法的有效性。最后,基于WAPSSM等三类特征,结合cascade XGBoost算法,我们开发了一种新的跨膜蛋白突变预测器,命名为MutTMPredictor。我们针对七个数据集上的几个现有预测器对 MutTMPredictor 的性能进行了基准测试。在 546 个突变数据集上,MutTMPredictor 达到了准确度(ACC ) 为 0.9661,马修相关系数 ( MCC ) 为 0.8950。而在 67,584 数据集上,MutTMPredictor 的MCC为 0.7523,曲线面积 ( AUC ) 为 0.8746,分别比现有的最佳预测器 (fathmm) 高 0.1625 和 0.0801。此外,MutTMPredictor 在 Pred-MutHTP 数据集上的表现也优于两个特定的预测器。结果表明,MutTMPredictor 可以作为一种有效的方法来预测和优先考虑跨膜蛋白中的错义突变。MutTMPredictor 网络服务器和数据集可免费访问http://csbio.njust.edu.cn/bioinf/muttmpredictor/用于学术用途。

更新日期:2021-11-19
down
wechat
bug