Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-05-13 , DOI: 10.1186/s13321-021-00516-0
Morgan Thomas ₁ , Robert T Smith ₂ , Noel M O'Boyle ₂ , Chris de Graaf ₂ , Andreas Bender ₁

Affiliation

Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide—a structure-based approach—as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it.

中文翻译：

深度生成模型中基于结构和配体的评分函数的比较：GPCR 案例研究

深度生成模型已显示出设计有效且新颖的化学的能力，这可以显着加速生物活性化合物的识别。然而，许多当前模型使用分子描述符或基于配体的预测方法来引导分子生成到理想的属性空间。这限制了它们对数据相对丰富的目标的应用，而忽略了那些数据很少、无法充分训练预测器的目标。此外，基于配体的方法通常使分子生成偏向于先前建立的化学空间，从而限制了它们识别真正新颖的化学型的能力。在这项工作中，我们评估了通过 Glide（一种基于结构的方法）使用分子对接作为评分函数来指导深度生成模型 REINVENT 的能力，并将模型性能和行为与基于配体的评分函数进行比较。此外，我们修改了之前发布的 MOSES 基准数据集，以消除对不可质子化基团的任何诱导偏差。我们还提出了一种新的指标来衡量数据集多样性，与常用的内部多样性指标相比，该指标较少受到重原子计数分布的干扰。关于主要发现，我们发现，当优化针对 DRD2 的对接分数时，该模型将预测的配体亲和力提高到超过已知 DRD2 活性分子的水平。此外，与基于配体的方法相比，生成的分子占据了互补的化学和物理化学空间，并且与已知的 DRD2 活性分子相比，占据了新颖的物理化学空间。此外，基于结构的方法学习生成满足关键残基相互作用的分子，这是只有在考虑蛋白质结构时才可用的信息。总体而言，这项工作证明了使用分子对接来指导从头分子生成相对于基于配体的预测器在预测亲和力、新颖性以及识别配体和蛋白质靶标之间关键相互作用的能力方面的优势。实际上，这种方法可应用于早期命中生成活动，以丰富针对特定目标的虚拟库，也可应用于以新颖性为中心的项目，其中从头分子生成要么没有可用的配体知识，要么不应受到其影响。

更新日期：2021-05-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11