Pair Potentials as Machine Learning Features.,Journal of Chemical Theory and Computation

当前位置： X-MOL 学术 › J. Chem. Theory Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pair Potentials as Machine Learning Features.
Journal of Chemical Theory and Computation ( IF 5.7 ) Pub Date : 2020-06-19 , DOI: 10.1021/acs.jctc.9b01246
Jun Pei ₁ , Lin Frank Song ₁ , Kenneth M Merz ₁

Affiliation

Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919−1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.

中文翻译：

将电位配对为机器学习功能。

原子成对势能函数构成蛋白质诱饵检测许多计分函数的重要组成部分。随着机器学习（ML）工具的发展，有多种方法可以组合潜在功能以创建新颖的ML模型和方法。潜在的功能参数可以轻松提取；但是，通常很难从得分函数直接获得计算出的原子成对能量。作为最流行的建模程序套件之一，Amber具有悠久的历史和丰富的力场潜力功能库。在这项工作中，我们直接使用了Amber的ff94和ff14SB中的力场参数，并对它们进行了编码，以计算不同相互作用的原子成对能量。两组结构（单个氨基酸组和一个二肽组）用于评估我们编码的琥珀色电势的性能。从我们的编码和Amber获得的能量项之间的比较结果，我们发现所有测试结构的能量差在±0.06 kcal / mol之内。先前我们已经证明，随机森林（RF）模型可以帮助强调更重要的原子成对相互作用，而忽略无关紧要的[裴杰；郑，Z. ; KM梅尔兹 J.化学 Inf。模型。 2019， 59（1919-1929）。在这里，作为将ML方法与传统势函数相结合的示例，我们遵循相同的工作流程将RF模型与Amber的力场势函数相结合。为了确定具有力场势函数的RF模型的性能，我们使用224种不同的蛋白质天然诱饵系统作为我们的训练和测试集。我们发现具有ff94和ff14SB力场参数的RF模型优于所有其他得分函数（RF模型在本工作中考虑使用KECSA2，RWplus，DFIRE，dDFIRE和GOAP进行本机结构检测，并且它们在检测最佳诱饵时的性能类似。通过在构建我们的RF模型中纳入最佳诱饵与诱饵的比较，我们能够生成在准确性和最佳诱饵检测方面均胜过本文测试得分函数的模型，再次显示了我们的RF模型解决此问题的性能和灵活性。最后，还测试了RF算法和力场参数的重要性，并且比较结果表明，RF算法和力场电势均很重要，而ML评分功能只有将它们组合在一起才能达到最佳性能。可以在https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection中找到该工作中使用的所有代码和数据。还测试了RF算法和力场参数的重要性，并且比较结果表明，RF算法和力场电势都很重要，而ML评分功能仅通过将它们组合在一起才能达到最佳性能。可以在https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection中找到该工作中使用的所有代码和数据。还测试了RF算法和力场参数的重要性，比较结果表明RF算法和力场电势都很重要，而ML评分功能只有将它们组合在一起才能达到最佳性能。可以在https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection中找到该工作中使用的所有代码和数据。

更新日期：2020-08-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11