当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
Journal of Computer-Aided Molecular Design ( IF 3.0 ) Pub Date : 2021-06-28 , DOI: 10.1007/s10822-021-00400-x
Nazanin Donyapour 1 , Alex Dickson 1, 2
Affiliation  

The prediction of \(\log P\) values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard–Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a “master” dataset comprised of over 41,000 unique \(\log P\) values. The specific molecular targets in the SAMPL7 \(\log P\) prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 \(\log P\) units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end \(\log P\) predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes.



中文翻译:

使用 ClassicalGSG 方法预测 SAMPL7 物理属性挑战的分配系数

\(\log P\)值的预测是蛋白质和配体建模 (SAMPL) 盲挑战统计评估的一部分。在这里,我们使用一种称为图几何散射(GSG)的分子图表示方法将原子属性转换为分子特征。这里使用的原子属性是来自经典分子力场的参数,包括部分电荷和伦纳德-琼斯相互作用参数。GSG 的分子特征用作神经网络的输入,神经网络使用由 41,000 多个唯一\(\log P\)值组成的“主”数据集进行训练。SAMPL7 \(\log P\)预测挑战中的特定分子目标是独特的,因为它们都包含磺酰基部分。这激发了一系列 ClassicalGSG 提交,其中预测器在主数据集的不同子集上进行训练,这些子集根据化学类型和/或磺酰基部分的存在进行过滤。我们发现我们的排名预测以 0.77 \(\log P\)单位的 RMSE和 MAE 为 0.62 获得了第五名,而我们的一个非排名预测以 RMSE 为 0.55 和 MAE 获得了所有提交的第一名0.44。挑战结束后,我们还检查了允许端到端\(\log P\)预测模型的开源力场参数的性能:通用 AMBER 力场 (GAFF)、通用力场 (UFF) )、默克分子力场 94 (MMFF94) 和 Ghemical。我们发现,与使用 CGenFF 原子属性训练的模型相比,使用 MMFF94 原子属性训练的 ClassicalGSG 模型可以产生更准确的预测。

更新日期:2021-06-28
down
wechat
bug