当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustered tree regression to learn protein energy change with mutated amino acid.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2022-11-19 , DOI: 10.1093/bib/bbac374
Hongwei Tu 1 , Yanqiang Han 1 , Zhilong Wang 1 , Jinjin Li 1
Affiliation  

Accurate and effective prediction of mutation-induced protein energy change remains a great challenge and of great interest in computational biology. However, high resource consumption and insufficient structural information of proteins severely limit the experimental techniques and structure-based prediction methods. Here, we design a structure-independent protocol to accurately and effectively predict the mutation-induced protein folding free energy change with only sequence, physicochemical and evolutionary features. The proposed clustered tree regression protocol is capable of effectively exploiting the inherent data patterns by integrating unsupervised feature clustering by K-means and supervised tree regression using XGBoost, and thus enabling fast and accurate protein predictions with different mutations, with an average Pearson correlation coefficient of 0.83 and an average root-mean-square error of 0.94kcal/mol. The proposed sequence-based method not only eliminates the dependence on protein structures, but also has potential applications in protein predictions with rare structural information.

中文翻译:

聚类树回归学习突变氨基酸的蛋白质能量变化。

准确有效地预测突变诱导的蛋白质能量变化仍然是一个巨大的挑战,并且在计算生物学中引起了极大的兴趣。然而,蛋白质的高资源消耗和结构信息不足严重限制了实验技术和基于结构的预测方法。在这里,我们设计了一个与结构无关的方案,以准确有效地预测突变诱导的蛋白质折叠自由能变化,仅具有序列、物理化学和进化特征。所提出的聚类树回归协议能够通过集成 K-means 的无监督特征聚类和使用 XGBoost 的监督树回归来有效地利用固有数据模式,从而能够快速准确地预测具有不同突变的蛋白质,平均皮尔逊相关系数为 0.83,平均均方根误差为 0.94kcal/mol。所提出的基于序列的方法不仅消除了对蛋白质结构的依赖,而且在具有稀有结构信息的蛋白质预测中具有潜在的应用。
更新日期:2022-09-17
down
wechat
bug