当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures
Bioinformatics ( IF 5.8 ) Pub Date : 2021-09-16 , DOI: 10.1093/bioinformatics/btab666
Rahul Kaushik 1 , Kam Y J Zhang 1
Affiliation  

Motivation An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins. Results The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman’s and Pearson’s correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design. Availability and implementation http://github.com/KYZ-LSB/ProTerS-FitFun. Supplementary information Supplementary data are available at Bioinformatics online.

中文翻译:

ProFitFun:用于量化模型结构准确性的蛋白质三级结构适应度函数

动机 对蛋白质模型结构质量的准确估计是蛋白质结构预测机制的基石。尽管最近在蛋白质结构预测领域取得了突破性的成功,但在蛋白质结构预测的多个阶段改进模型质量估计,从而进一步提高预测精度仍有一定的前景。在这里,通过利用实验蛋白质结构的序列和结构特征,根据主链二面角的偏好和三肽氨基酸残基的相对表面可及性,提出了一种名为 ProFitFun 的新方法来评估蛋白质模型的质量等级。所提出的方法通过考虑其在蛋白质结构中的 N 端和 C 端邻居来利用残留物的骨干二面角和表面可及性偏好。这些偏好用于通过机器学习方法评估蛋白质结构,并在广泛的不同蛋白质数据集上进行测试。结果 该方法在蛋白质结构的大型测试数据集 (n = 25 005) 上得到了广泛验证,该数据集包括 82 种非同源蛋白质和 1344 种非同源实验结构的 23 661 个模型。此外,还使用了包含 200 种非同源蛋白质的 40 000 个模型的外部数据集来验证所提出的方法。这两个数据集都进一步用于对所提出的方法进行基准测试,采用四种不同的最先进的蛋白质结构质量评估方法。在基准测试中,所提出的方法在 Spearman 和 Pearson 的相关系数、平均 GDT-TS 损失、z 分数之和以及预测相对于相应观测值的平均绝对差方面优于一些最先进的方法。所提出方法的高精度有望在计算蛋白质设计中使用序列和结构特征。可用性和实施​​ http://github.com/KYZ-LSB/ProTerS-FitFun。补充信息 补充数据可在 Bioinformatics 在线获取。所提出方法的高精度有望在计算蛋白质设计中使用序列和结构特征。可用性和实施​​ http://github.com/KYZ-LSB/ProTerS-FitFun。补充信息 补充数据可在 Bioinformatics 在线获取。所提出方法的高精度有望在计算蛋白质设计中使用序列和结构特征。可用性和实施​​ http://github.com/KYZ-LSB/ProTerS-FitFun。补充信息 补充数据可在 Bioinformatics 在线获取。
更新日期:2021-09-16
down
wechat
bug