Abstract
We use large-scale mutagenesis data and computer simulations to quantify the mutational robustness of protein-coding genes by taking into account constraints arising from protein function and the genetic code. Analyses of the distribution of amino acid substitutions from 18 mutagenesis studies revealed an average of 45% of neutral variants; while mutagenesis data of 12 proteins artificially designed under no other constraints but stability, reach an average of 60%. Simulations using a lattice protein model allow us to contrast these estimates to the expected mutational robustness of protein families by generating unbiased samples of foldable sequences, which we find to have 30% of neutral variants. In agreement with mutagenesis data of designed proteins, the model shows that maximally robust protein families might access up to twice the amount of neutral variants observed in the unbiased samples (i.e. 60%). A biophysical model of protein-ligand binding suggests that constraints associated to molecular function have only a moderate impact on robustness of approximately 5 to 10% of neutral variants; and that the direction of this effect depends on the relation between functional performance and thermodynamic stability. Although the genetic code constraints the access of a gene’s nucleotide sequence to only 30% of the full distribution of amino acid mutations, it provides an extra 15 to 20% of neutral variants to the estimations above, such that the expected, observed, and maximal robustness of protein-coding genes are approximately 50, 65, and 75%, respectively. We discuss our results in the light of three main hypothesis put forward to explain the existence of mutationally robust genes.
Similar content being viewed by others
References
Azevedo RB, Lohaus R, Srinivasan S, Dang KK, Burch CL (2006) Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature 440(7080):87–90
Babajide A, Farber R, Hofacker IL, Inman J, Lapedes AS, Stadler PF (2001) Exploring protein sequence space using knowledge-based potentials. J Theor Biol 212(1):35–46
Babajide A, Hofacker IL, Sippl MJ, Stadler PF (1997) Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Fold Des 2(5):261–269
Bloom JD, Lu Z, Chen D, Raval A, Venturelli OS, Arnold FH (2007) Evolution favors protein mutational robustness in sufficiently large populations. BMC Biol 5(1):29
Bornberg-Bauer E (1997) How are model protein structures distributed in sequence space? Biophys J 73(5):2393–2403
Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci 96(19):10689–10694
Boucher JI, Bolon DN, Tawfik DS (2016) Quantifying and understanding the fitness effects of protein mutations: laboratory versus nature. Protein Sci 25(7):1219–1226
Bowie JU, Reidhaar-Olson JF, Lim WA, Sauer RT (1990) Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247(4948):1306–1310
Bratulic S, Gerber F, Wagner A (2015) Mistranslation drives the evolution of robustness in tem-1 \(\beta\)-lactamase. Proc Natl Acad Sci 112(41):12758–12763
Chan HS, Dill KA (1991) “sequence space soup’’ of proteins and copolymers. J Chem Phys 95(5):3775–3787
Chan HS, Dill KA (1996) Comparing folding codes for proteins and polymers. Proteins-Struct Funct Genet 24(3):335–344
Chothia C (1976) The nature of the accessible and buried surfaces in proteins. J Mol Biol 105(1):1–12
DePristo MA, Weinreich DM, Hartl DL (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6(9):678–687
Dill KA, Bromberg S, Yue K, Chan HS, Ftebig KM, Yee DP, Thomas PD (1995) Principles of protein folding-a perspective from simple exact models. Protein Sci 4(4):561–602
Doi N, Kakukawa K, Oishi Y, Yanagawa H (2005) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng Des Sel 18(6):279–284
Drummond DA, Silberg JJ, Meyer MM, Wilke CO, Arnold FH (2005) On the conservative nature of intragenic recombination. Proc Natl Acad Sci USA 102(15):5380–5385
Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134(2):341–352
Dyson HJ, Wright PE (2002) Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 12(1):54–60
Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46:85–103
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A et al (2019) The pfam protein families database in 2019. Nucleic Acids Res 47(D1):D427–D432
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF (2019) Mavedb: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 20(1):1–11
Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610
Ferrada E (2019) The site-specific amino acid preferences of homologous proteins depend on sequence divergence. Genome Biol Evol 11(1):121–135
Finkelstein A, Gutin A, Badretdinov A (1994) Boltzmann-like statistics of protein architectures. Origins and consequences. Sub-Cell Biochem 24:1–26
Freeland SJ, Hurst LD (1998) The genetic code is one in a million. J Mol Evol 47(3):238–248
Ghosh K, Dill KA (2009) Computing protein stabilities from their chain lengths. Proc Natl Acad Sci 106(26):10649–10654
Goldstein R.A (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79(5):1396–1407
Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320(2):369–387
Hartling J, Kim J (2008) Mutational robustness and geometrical form in protein structures. J Exp Zool B 310(3):216–226
Jerrum M, Sinclair A (1996) The markov chain monte carlo method: an approach to approximate counting and integration. PWS Publishing, Boston
Jiang RJ (2019) Exhaustive mapping of missense variation in coronary heart disease-related genes. Ph.D. thesis, University of Toronto, Canada
Kitzman JO, Starita LM, Lo RS, Fields S, Shendure J (2015) Massively parallel single-amino-acid mutagenesis. Nat Methods 12(3):203–206
Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22(10):3986–3997
Li H, Helling R, Tang C, Wingreen N (1996) Emergence of preferred structures in a simple model of protein folding. Science 273(5275):666
Lind PA, Arvidsson L, Berg OG, Andersson DI (2016) Variation in mutational robustness between different proteins and the predictability of fitness effects. Mol Biol Evol 34(2):408–418
Lipman DJ, Wilbur WJ (1991) Modelling neutral and selective evolution of protein folding. Proc R Soc Lond B 245(1312):7–11
Lynch M, Conery JS (2003) The origins of genome complexity. Science 302(5649):1401–1404
Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA, Gray VE, Kircher M, Khechaduri A, Dines JN, Hause RJ et al (2018) Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet 50(6):874–882
Maynard-Smith J (1970) Natural selection and the concept of a protein space. Nature 225(5232):563–564
McLaughlin RN, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R (2012) The spatial architecture of protein function and adaptation. Nature 491(7422):138–142
Melamed D, Young DL, Gamble CE, Miller CR, Fields S (2013) Deep mutational scanning of an rrm domain of the saccharomyces cerevisiae poly (a)-binding protein. RNA 19(11):1537–1551
Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS (2014) Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res 42(14):e112–e112
Miller J.H (1979) Genetic studies of the lac repressor: Xi. on aspects of lac repressor structure suggested by genetic experiments. J Mol Biol 131(2):249–258
Mishra P, Flynn JM, Starr TN, Bolon DN (2016) Systematic mutant analyses elucidate general and client-specific aspects of hsp90 function. Cell Rep 15(3):588–598
Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18(3):534–552
Nei M (2013) Mutation-driven evolution. OUP, Oxford
Olson CA, Wu NC, Sun R (2014) A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr Biol 24(22):2643–2651
Phillips R, Kondev J, Theriot J, Garcia H (2012) Physical biology of the cell. Garland Science, New York
Rao S.P, Carlstrom D.E, Miller W.G (1974) Collapsed structure polymers. Scattergun approach to amino acid copolymers. Biochemistry 13(5):943–952
Redler RL, Das J, Diaz JR, Dokholyan NV (2016) Protein destabilization as a common factor in diverse inherited disorders. J Mol Evol 82(1):11–16
Reidys C, Stadler PF, Schuster P (1997) Generic properties of combinatory maps: neutral networks of rna secondary structures. Bull Math Biol 59(2):339–397
Rennell D, Bouvier SE, Hardy LW, Poteete AR (1991) Systematic mutation of bacteriophage t4 lysozyme. J Mol Biol 222(1):67–88
Robertson AD, Murphy KP (1997) Protein structure and the energetics of protein stability. Chem Rev 97(5):1251–1268
Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A et al (2017) Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357(6347):168–175
Roscoe BP, Thayer KM, Zeldovich KB, Fushman D, Bolon DN (2013) Analyses of the effects of all ubiquitin point mutants on yeast growth rate. J Mol Biol 425(8):1363–1377
Shendure J, Fields S (2016) Massively parallel genetics. Genetics 203(2):617–619
Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P (2002) Prosite: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3):265–274
Starita LM, Pruneda JN, Lo RS, Fowler DM, Kim HJ, Hiatt JB, Shendure J, Brzovic PS, Fields S, Klevit RE (2013) Activity-enhancing mutations in an e3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci 110(14):E1263–E1272
Stiffler MA, Hekstra DR, Ranganathan R (2015) Evolvability as a function of purifying selection in tem-1 \(\beta\)-lactamase. Cell 160(5):882–892
Suckow J, Markiewicz P, Kleina LG, Miller J, Kisters-Woike B, Müller-Hill B (1996) Genetic studies of the lac repressor xv: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure. J Mol Biol 261(4):509–523
Sun S, Weile J, Verby M, Wu Y, Wang Y, Cote AG, Fotiadou I, Kitaygorodsky J, Vidal M, Rine J et al (2020) A proactive genotype-to-patient-phenotype map for cystathionine beta-synthase. Genome Med 12(1):1–18
Tenaillon O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, Wu GC, Wielgoss S, Cruveiller S, Médigue C et al (2016) Tempo and mode of genome evolution in a 50,000-generation experiment. Nature 536(7615):165–170
Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS (2007) The stability effects of protein mutations appear to be universally distributed. J Mol Biol 369(5):1318–1332
Tokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 19(5):596–604
Van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci 96(17):9716–9720
Weile J, Sun S, Cote AG, Knapp J, Verby M, Mellor JC, Wu Y, Pons C, Wong C, van Lieshout N et al (2017) A framework for exhaustively mapping functional missense variants. Mol Syst Biol 13(12):957
Xia Y, Levitt M (2002) Roles of mutation and recombination in the evolution of protein thermodynamics. Proc Natl Acad Sci 99(16):10382–10387
Yue P, Li Z, Moult J (2005) Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 353(2):459–473
Zhang J, Yang JR (2015) Determinants of the rate of protein sequence evolution. Nat Rev Genet 16(7):409–420
Acknowledgements
The author thanks to Dr. Patricio Orio at the Centro Interdisciplinario de Neurociencias de Valparíso (CINV), for access to computational resources; and to the Chilean National Agency for Research and Development (ANID), for support through the project REDES190089.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Erich Bornberg-Bauer.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ferrada, E. Quantifying the Mutational Robustness of Protein-Coding Genes. J Mol Evol 89, 357–369 (2021). https://doi.org/10.1007/s00239-021-10009-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-021-10009-1