当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stacking Gaussian processes to improve $$pK_a$$ p K a predictions in the SAMPL7 challenge
Journal of Computer-Aided Molecular Design ( IF 3.5 ) Pub Date : 2021-08-07 , DOI: 10.1007/s10822-021-00411-8
Robert M Raddi 1 , Vincent A Voelz 1
Affiliation  

Accurate predictions of acid dissociation constants are essential to rational molecular design in the pharmaceutical industry and elsewhere. There has been much interest in developing new machine learning methods that can produce fast and accurate pKa predictions for arbitrary species, as well as estimates of prediction uncertainty. Previously, as part of the SAMPL6 community-wide blind challenge, Bannan et al. approached the problem of predicting \(pK_{a}\)s by using a Gaussian process regression to predict microscopic \(pK_{a}\)s, from which macroscopic \(pK_{a}\) values can be analytically computed (Bannan et al. in J Comput-Aided Mol Des 32:1165–1177). While this method can make reasonably quick and accurate predictions using a small training set, accuracy was limited by the lack of a sufficiently broad range of chemical space in the training set (e.g., the inclusion of polyprotic acids). Here, to address this issue, we construct a deep Gaussian Process (GP) model that can include more features without invoking the curse of dimensionality. We trained both a standard GP and a deep GP model using a database of approximately 3500 small molecules curated from public sources, filtered by similarity to targets. We tested the model on both the SAMPL6 and more recent SAMPL7 challenge, which introduced a similar lack of ionizable sites and/or environments found between the test set and the previous training set. The results show that while the deep GP model made only minor improvements over the standard GP model for SAMPL6 predictions, it made significant improvements over the standard GP model in SAMPL7 macroscopic predictions, achieving a MAE of 1.5 \(pK_{a}\).



中文翻译:

堆叠高斯过程以改进 SAMPL7 挑战中的 $$pK_a$$ p K a 预测

酸解离常数的准确预测对于制药行业和其他领域的合理分子设计至关重要。人们对开发新的机器学习方法非常感兴趣,这些方法可以为任意物种产生快速准确的 pKa 预测,以及预测不确定性的估计。此前,作为 SAMPL6 社区范围的盲人挑战的一部分,Bannan 等人。通过使用高斯过程回归来预测微观\(pK_{a}\) s,从而解决了预测\ (pK_{a}\) s 的问题,其中宏观\(pK_{a}\)值可以通过分析计算(Bannan 等人在 J Comput-Aided Mol Des 32:1165–1177 中)。虽然这种方法可以使用一个小的训练集做出相当快速和准确的预测,但准确性受到训练集中缺乏足够广泛的化学空间范围的限制(例如,包含多元酸)。在这里,为了解决这个问题,我们构建了一个深度高斯过程 (GP) 模型,该模型可以包含更多特征,而不会引发维度灾难。我们使用来自公共资源的大约 3500 个小分子的数据库训练了标准 GP 和深度 GP 模型,这些小分子通过与目标的相似性进行过滤。我们在 SAMPL6 和最近的 SAMPL7 挑战中测试了该模型,该挑战引入了在测试集和之前的训练集之间发现的类似缺乏可电离位点和/或环境。\(pK_{a}\)

更新日期:2021-08-09
down
wechat
bug