Abstract
Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein–ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein–ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.
Similar content being viewed by others
Abbreviations
- 3D:
-
Three-dimensional
- CATH:
-
Class Architecture Topology Homologous superfamily
- H-bonds:
-
Hydrogen bonds
- MMFF94:
-
Merck Molecular Force Field
- PDB:
-
Protein Data Bank
- PRI:
-
Protein Recognition Index
References
Zavodszky MI, Sanschagrin PC, Korde RS, Kuhn LA (2002) Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening. J Comput Aided Mol Des 16:883–902
Sukuru SCK, Crepin T, Milev Y, Marsh LC, Hill JB, Anderson RJ, Morris JC, Rohatgi A, O’Mahony G, Grøtli M et al. (2006) Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change. J Comput Aided Mol Des 20:159–178
Zavodszky MI, Rohatgi A, Van Voorst JR, Yan H, Kuhn LA (2009) Scoring ligand similarity in structure-based virtual screening. J Mol Recognit 22:280–292
Van Voorst JR, Tong Y, Kuhn LA (2012) ArtSurf: a method for deformable partial matching of protein small-molecule binding sites. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp 36–43
Nittinger E, Inhester T, Bietz S, Meyder A, Schomburg KT, Lange G, Klein R, Rarey M (2017) Large-scale analysis of hydrogen bond interaction patterns in protein-ligand interfaces. J Med Chem 60:4245–4257
McDonald I, Thornton JM (1994) Atlas of side-chain and main-chain hydrogen bonding. Biochemistry and Molecular Biology Department, University College London, London. http://www.biochem.ucl.ac.uk/bsm/atlas
Panigrahi SK, Desiraju GR (2007) Strong and weak hydrogen bonds in the protein–ligand interface. Proteins Struct Funct Bioinform 67:128–141
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25
Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I (2016) CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45:289–295
Ahmed A, Smith RD, Clark JJ, Dunbar JB Jr, Carlson HA (2014) Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res 43:465–469
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2006) The Protein Data Bank. In: Rossmann MG, Arnold E (eds) International tables crystallography volume F: crystallography biological macromolecules. Springer, New York, pp 675–684
Warren GL, Do TD, Kelley BP, Nicholls A, Warren SD (2012) Essential considerations for using protein–ligand structures in drug discovery. Drug Discov Today 17:1270–1281
Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K (2009) Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins Struct Funct Bioinform 77:114–122
Krieger E, Dunbrack RL, Hooft RWW, Krieger B (2012) Assignment of protonation states in proteins and ligands: combining pKa prediction with hydrogen bonding network optimization. Methods Mol Biol Comput Drug Discov Des 819:405–421
Colominas C, Luque FJ, Orozco M (1996) Tautomerism and protonation of guanine and cytosine: implications in the formation of hydrogen-bonded complexes. J Am Chem Soc 118:6811–6821
Krieger E, Darden T, Nabuurs SB, Finkelstein A, Vriend G (2004) Making optimal use of empirical energy functions: force-field parameterization in crystal space. Proteins Struct Funct Bioinform 57:678–683
Jakalian A, Jack DB, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges: AM1-BCC model: II—parameterization and validation. J Comput Chem 23:1623–1641
DeLano WL (2002) Pymol: an open-source molecular graphics tool. CCP4 News on Protein Crystallogr 40:82–92
Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519
Tripos (2007) Tripos Mol2 file format. St Louis, MO, http://www.tripos.com/data/support/mol2.pdf
Pauling L (1960) The nature of the chemical bond and the structure of molecules and crystals: an introduction to modern structural chemistry. Cornell University Press, Ithaca
Ippolito JA, Alexander RS, Christianson DW (1990) Hydrogen bond stereochemistry in protein structure and function. J Mol Biol 215:457–471
Prakash B, Renault L, Praefcke GJK, Herrmann C, Wittinghofer A (2000) Triphosphate structure of guanylate-binding protein 1 and implications for nucleotide binding and GTPase mechanism. EMBO J 19:4555–4564
Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30
Jones E, Oliphant T, Peterson P (2001) SciPy: open source scientific tools for Python, http://www.scipy.org
McKinney W (2010) Data structures for statistical computing Python. In: Millman J, van der Walt S (eds) Proceeding of 9th Python Science Conference, pp 51–56
Raschka S (2017) BioPandas: working with molecular structures in pandas DataFrames. J Open Source Softw 2:1–3
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95
Hong S, Kim D (2016) Interaction between bound water molecules and local protein structures: a statistical analysis of the hydrogen bond structures around bound water molecules. Proteins Struct Funct Bioinform 84:43–51
Miyazawa S, Jernigan RL (1996) Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 256:623–644
Glaser F, Steinberg DM, Vakser IA, Ben-Tal N (2001) Residue frequencies and pairing preferences at protein–protein interfaces. Proteins Struct Funct Bioinform 43:89–102
Raymer ML, Sanschagrin PC, Punch WF, Venkataraman S, Goodman ED, Kuhn LA (1997) Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J Mol Biol 265:445–464
Shan S, Herschlag D (1996) The change in hydrogen bond strength accompanying charge rearrangement: Implications for enzymatic catalysis. Proc Natl Acad Sci 93:14474–14479
Bianchi A, Giorgi C, Ruzza P, Toniolo C, Milner-White EJ (2012) A synthetic hexapeptide designed to resemble a proteinaceous p-loop nest is shown to bind inorganic phosphate. Proteins Struct Funct Bioinform 80:1418–1424
Coleman DE, Sprang SR (1999) Structure of Giα1·GppNHp, autoinhibition in a Gα protein-substrate complex. J Biol Chem 274:16669–16672
Palumbi SR (2001) Humans as the world’s greatest evolutionary force. Science 293:1786–1790
Taylor R, Kennard O (1984) Hydrogen-bond geometry in organic crystals. Acc Chem Res 17:320–326
Sanschagrin PC, Kuhn LA (1998) Cluster analysis of consensus water sites in thrombin and trypsin shows conservation between serine proteases and contributions to ligand specificity. Protein Sci 7:2054–2064
Kuhn LA, Swanson CA, Pique ME, Tainer JA, Getzoff ED (1995) Atomic and residue hydrophilicity in the context of folded protein structures. Proteins Struct Funct Bioinforma 23:536–547
Gunner MR, Saleh MA, Cross E, Wise M et al. (2000) Backbone dipoles generate positive potentials in all proteins: origins and implications of the effect. Biophys J 78:1126–1144
Rubin K Ask an Earth Scientist, https://www.soest.hawaii.edu/GG/ASK/atmo-nitrogen.html. Accessed 17 Jan 2018
Feig M, Harada R, Mori T, Yu I, Takahashi K, Sugita Y (2015) Complete atomistic model of a bacterial cytoplasm for integrating physics, biochemistry, and systems biology. J Mol Graph Model 58:1–9
Raschka S, Bemister-Buffington J, Kuhn LA (2016) Detecting the native ligand orientation by interfacial rigidity: SiteInterlock. Proteins Struct Funct Bioinform 84:1888–1901
Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461
Neudert G, Klebe G (2011) DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes. J Chem Inf Model 51:2731–2745
Acknowledgements
This research was supported by funding from the Great Lakes Fishery Commission (Project ID: 2015_KUH_54031). We gratefully acknowledge OpenEye Scientific Software (Santa Fe, NM) for providing academic licenses for the use of their QUACPAC (molcharge) and OEChem software. We also thank the following lab graduates for their contributions to this research: Dr. Maria Zavodszky (now at GE Global Research Center), who observed that hydroxyl-rich ligands tended to result in false positives in screening, Dr. Amy Cayemberg McQuade (now at Carroll University) for carrying out the statistical analysis of protein-water-ligand hydrogen-bond bridges, and Dr. Jeffrey VanVoorst (now at Veritas Technologies, LLC) for developing the non-homologous dataset of 136 protein-small molecule complexes analyzed here. We thank Dr. Michael Feig (Michigan State University) for discussions on the biological basis for the prevalence of oxygen versus nitrogen in natural ligands and also appreciate the data he provided on the atomic composition of metabolites in Mycoplasma genitalium.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Raschka, S., Wolf, A.J., Bemister-Buffington, J. et al. Protein–ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes. J Comput Aided Mol Des 32, 511–528 (2018). https://doi.org/10.1007/s10822-018-0105-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-018-0105-2