Stacking Gaussian processes to improve $$pK_a$$ predictions in the SAMPL7 challenge

Raddi, Robert M.; Voelz, Vincent A.

doi:10.1007/s10822-021-00411-8

Stacking Gaussian processes to improve $pK_a$ predictions in the SAMPL7 challenge

Published: 07 August 2021

Volume 35, pages 953–961, (2021)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

380 Accesses
1 Citation
Explore all metrics

Abstract

Accurate predictions of acid dissociation constants are essential to rational molecular design in the pharmaceutical industry and elsewhere. There has been much interest in developing new machine learning methods that can produce fast and accurate pKa predictions for arbitrary species, as well as estimates of prediction uncertainty. Previously, as part of the SAMPL6 community-wide blind challenge, Bannan et al. approached the problem of predicting $pK_{a}$s by using a Gaussian process regression to predict microscopic $pK_{a}$s, from which macroscopic $pK_{a}$ values can be analytically computed (Bannan et al. in J Comput-Aided Mol Des 32:1165–1177). While this method can make reasonably quick and accurate predictions using a small training set, accuracy was limited by the lack of a sufficiently broad range of chemical space in the training set (e.g., the inclusion of polyprotic acids). Here, to address this issue, we construct a deep Gaussian Process (GP) model that can include more features without invoking the curse of dimensionality. We trained both a standard GP and a deep GP model using a database of approximately 3500 small molecules curated from public sources, filtered by similarity to targets. We tested the model on both the SAMPL6 and more recent SAMPL7 challenge, which introduced a similar lack of ionizable sites and/or environments found between the test set and the previous training set. The results show that while the deep GP model made only minor improvements over the standard GP model for SAMPL6 predictions, it made significant improvements over the standard GP model in SAMPL7 macroscopic predictions, achieving a MAE of 1.5 $pK_{a}$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

Interpretable scientific discovery with symbolic regression: a review

Article Open access 02 January 2024

Deep learning in drug discovery: an integrative review and future challenges

Article Open access 17 November 2022

References

Gleeson MP (2008) Generation of a set of simple, interpretable ADMET rules of thumb. J Med Chem 51:817–834
Article CAS Google Scholar
Manallack DT, Prankerd RJ, Yuriev E, Oprea TI, Chalmers DK (2013) The significance of acid/base properties in drug discovery. Chem Soc Rev 42:485–496
Article CAS Google Scholar
SAMPL Challenge. https://www.samplchallenges.org. Accessed 1 Aug 2021
Işık M, Bergazin TD, Fox T, Rizzi A, Chodera JD, Mobley DL (2020) Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 Part II log P challenge. J Comput-Aided Mol Des 34:1–36
Article Google Scholar
Fraczkiewicz R, Lobell M, Goller AH, Krenz U, Schoenneis R, Clark RD, Hillisch A (2015) Best of both worlds: combining pharma data and state of the art modeling technology to improve in silico p K a prediction. J Chem Inf Model 55:389–397
Article CAS Google Scholar
Shields GC, Seybold PG (2013) Computational approaches for the prediction of pKa values. CRC Press, Boca Raton
Book Google Scholar
Fraczkiewicz R (2013) In silico prediction of ionization. Elsevier, Amsterdam
Book Google Scholar
Bannan CC, Mobley DL, Skillman AG (2018) SAMPL6 challenge results from $pK_a$ predictions based on a general Gaussian process model. J Comput Aided Mol Des 32:1165–1177
Article CAS Google Scholar
pKa-Prospector 1.1.5.1: OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com. Accessed 1 Aug 2021
Gunner MR, Murakami T, Rustenburg AS, Işık M, Chodera JD (2020) Standard state free energies, not pK as, are ideal for describing small molecule protonation and tautomeric states. J Comput-Aided Mol Des 34:1–13
Article Google Scholar
Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519
Article CAS Google Scholar
Jakalian A, Jack DB, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J Comput Chem 23:1623–1641
Article CAS Google Scholar
Wagner J et al. (2020) openforcefield/openforcefield: 0.8.0 virtual sites and bond interpolation. https://doi.org/10.5281/zenodo.4121930
Landrum G (2006) RDKit: Open-source cheminformatics
Software os cheminformatics software: molecular modeling software. OpenEye Scientific. http://www.eyesopen.com. Accessed 1 Aug 2021
Shrake A, Rupley JA (1973) Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol 79:351–371
Article CAS Google Scholar
Xing L, Glen RC, Clark RD (2003) Predicting p K a by molecular tree structured fingerprints and PLS. J Chem Inf Comput Sci 43:870–879
Article CAS Google Scholar
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
Article CAS Google Scholar
GPy (2012) GPy: a Gaussian process framework in python. http://github.com/SheffieldML/GPy. Accessed 1 Aug 2021
Damianou A, Lawrence N (2013) Deep gaussian processes. In: Artificial intelligence and statistics, pp 207–215
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Duvenaud D (2014) The Kernel cookbook: advice on covariance functions. https://www.cs.toronto.edu/duvenaud/cookbook. Accessed 1 Aug 2021
Yang Q, Li Y, Yang J-D, Liu Y, Zhang L, Luo S, Cheng J-P (2020) Holistic prediction of pKa in diverse solvents based on machine learning approach. Angew Chem 132(43):19444–19453
Article Google Scholar
Raddi R, Voelz V (2021) pKa database for stacking Gaussian Processes to improve pKa predictions in the SAMPL7 challenge. ChemRxiv. https://doi.org/10.5281/zenodo.5027418
Article Google Scholar
Sushko I et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput-Aided Mol Des 25:533–554
Article CAS Google Scholar
Wishart DS et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082
Article CAS Google Scholar
Settimo L, Bellman K, Knegtel RM (2014) Comparison of the accuracy of experimental and predicted pKa values of basic and acidic compounds. Pharm Res 31:1082–1095
Article CAS Google Scholar
Titsias M (2009) Variational learning of inducing variables in sparse Gaussian processes. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:567–574.
Francisco KR, Varricchio C, Paniak TJ, Kozlowski MC, Brancale A, Ballatore C (2021) Structure property relationships of N-acylsulfonamides and related bioisosteres. Eur J Med Chem 218:113399
Article CAS Google Scholar
Caine BA, Bronzato M, Popelier PL (2019) Experiment stands corrected: accurate prediction of the aqueous p K a values of sulfonamide drugs using equilibrium bond lengths. Chem Sci 10:6368–6381
Article CAS Google Scholar
Nigam A, Pollice R, Hurley M, FD, Hickman RJ, Aldeghi M, Yoshikawa N, Chithrananda S, Voelz VA, Aspuru-Guzik A (2021) Assigning confidence to molecular property prediction. Expert Opin Drug Discovery. https://doi.org/10.1080/17460441.2021.1925247

Download references

Acknowledgements

RMR and VAV are supported by National Institutes of Health Grant R01GM123296. We appreciate the National Institutes of Health for its support of the SAMPL project via R01GM124270 to David L. Mobley (UC Irvine)

Author information

Authors and Affiliations

Department of Chemistry, Temple University, Philadelphia, PA, 19122, USA
Robert M. Raddi & Vincent A. Voelz

Authors

Robert M. Raddi
View author publications
You can also search for this author in PubMed Google Scholar
Vincent A. Voelz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent A. Voelz.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 767 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raddi, R.M., Voelz, V.A. Stacking Gaussian processes to improve $pK_a$ predictions in the SAMPL7 challenge. J Comput Aided Mol Des 35, 953–961 (2021). https://doi.org/10.1007/s10822-021-00411-8

Download citation

Received: 12 April 2021
Accepted: 05 July 2021
Published: 07 August 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10822-021-00411-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stacking Gaussian processes to improve \(pK_a\) predictions in the SAMPL7 challenge

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Interpretable scientific discovery with symbolic regression: a review

Deep learning in drug discovery: an integrative review and future challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Supplementary Information

Supplementary material 1 (pdf 767 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stacking Gaussian processes to improve \(pK_a\) predictions in the SAMPL7 challenge

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Interpretable scientific discovery with symbolic regression: a review

Deep learning in drug discovery: an integrative review and future challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Supplementary Information

Supplementary material 1 (pdf 767 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation