Skip to main content
Log in

Quantifying the Mutational Robustness of Protein-Coding Genes

  • Original Article
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

We use large-scale mutagenesis data and computer simulations to quantify the mutational robustness of protein-coding genes by taking into account constraints arising from protein function and the genetic code. Analyses of the distribution of amino acid substitutions from 18 mutagenesis studies revealed an average of 45% of neutral variants; while mutagenesis data of 12 proteins artificially designed under no other constraints but stability, reach an average of 60%. Simulations using a lattice protein model allow us to contrast these estimates to the expected mutational robustness of protein families by generating unbiased samples of foldable sequences, which we find to have 30% of neutral variants. In agreement with mutagenesis data of designed proteins, the model shows that maximally robust protein families might access up to twice the amount of neutral variants observed in the unbiased samples (i.e. 60%). A biophysical model of protein-ligand binding suggests that constraints associated to molecular function have only a moderate impact on robustness of approximately 5 to 10% of neutral variants; and that the direction of this effect depends on the relation between functional performance and thermodynamic stability. Although the genetic code constraints the access of a gene’s nucleotide sequence to only 30% of the full distribution of amino acid mutations, it provides an extra 15 to 20% of neutral variants to the estimations above, such that the expected, observed, and maximal robustness of protein-coding genes are approximately 50, 65, and 75%, respectively. We discuss our results in the light of three main hypothesis put forward to explain the existence of mutationally robust genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Azevedo RB, Lohaus R, Srinivasan S, Dang KK, Burch CL (2006) Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature 440(7080):87–90

    Article  CAS  PubMed  Google Scholar 

  • Babajide A, Farber R, Hofacker IL, Inman J, Lapedes AS, Stadler PF (2001) Exploring protein sequence space using knowledge-based potentials. J Theor Biol 212(1):35–46

    Article  CAS  PubMed  Google Scholar 

  • Babajide A, Hofacker IL, Sippl MJ, Stadler PF (1997) Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Fold Des 2(5):261–269

    Article  CAS  PubMed  Google Scholar 

  • Bloom JD, Lu Z, Chen D, Raval A, Venturelli OS, Arnold FH (2007) Evolution favors protein mutational robustness in sufficiently large populations. BMC Biol 5(1):29

    Article  PubMed  PubMed Central  Google Scholar 

  • Bornberg-Bauer E (1997) How are model protein structures distributed in sequence space? Biophys J 73(5):2393–2403

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci 96(19):10689–10694

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boucher JI, Bolon DN, Tawfik DS (2016) Quantifying and understanding the fitness effects of protein mutations: laboratory versus nature. Protein Sci 25(7):1219–1226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bowie JU, Reidhaar-Olson JF, Lim WA, Sauer RT (1990) Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247(4948):1306–1310

    Article  CAS  PubMed  Google Scholar 

  • Bratulic S, Gerber F, Wagner A (2015) Mistranslation drives the evolution of robustness in tem-1 \(\beta\)-lactamase. Proc Natl Acad Sci 112(41):12758–12763

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chan HS, Dill KA (1991) “sequence space soup’’ of proteins and copolymers. J Chem Phys 95(5):3775–3787

    Article  CAS  Google Scholar 

  • Chan HS, Dill KA (1996) Comparing folding codes for proteins and polymers. Proteins-Struct Funct Genet 24(3):335–344

    Article  CAS  PubMed  Google Scholar 

  • Chothia C (1976) The nature of the accessible and buried surfaces in proteins. J Mol Biol 105(1):1–12

    Article  CAS  PubMed  Google Scholar 

  • DePristo MA, Weinreich DM, Hartl DL (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6(9):678–687

    Article  CAS  PubMed  Google Scholar 

  • Dill KA, Bromberg S, Yue K, Chan HS, Ftebig KM, Yee DP, Thomas PD (1995) Principles of protein folding-a perspective from simple exact models. Protein Sci 4(4):561–602

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Doi N, Kakukawa K, Oishi Y, Yanagawa H (2005) High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng Des Sel 18(6):279–284

    Article  CAS  PubMed  Google Scholar 

  • Drummond DA, Silberg JJ, Meyer MM, Wilke CO, Arnold FH (2005) On the conservative nature of intragenic recombination. Proc Natl Acad Sci USA 102(15):5380–5385

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134(2):341–352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dyson HJ, Wright PE (2002) Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 12(1):54–60

    Article  CAS  PubMed  Google Scholar 

  • Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46:85–103

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A et al (2019) The pfam protein families database in 2019. Nucleic Acids Res 47(D1):D427–D432

    Article  CAS  PubMed  Google Scholar 

  • Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF (2019) Mavedb: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 20(1):1–11

    Article  Google Scholar 

  • Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610

    Article  CAS  PubMed  Google Scholar 

  • Ferrada E (2019) The site-specific amino acid preferences of homologous proteins depend on sequence divergence. Genome Biol Evol 11(1):121–135

    Article  CAS  PubMed  Google Scholar 

  • Finkelstein A, Gutin A, Badretdinov A (1994) Boltzmann-like statistics of protein architectures. Origins and consequences. Sub-Cell Biochem 24:1–26

    Google Scholar 

  • Freeland SJ, Hurst LD (1998) The genetic code is one in a million. J Mol Evol 47(3):238–248

    Article  CAS  PubMed  Google Scholar 

  • Ghosh K, Dill KA (2009) Computing protein stabilities from their chain lengths. Proc Natl Acad Sci 106(26):10649–10654

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Goldstein R.A (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79(5):1396–1407

    Article  CAS  PubMed  Google Scholar 

  • Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320(2):369–387

    Article  CAS  PubMed  Google Scholar 

  • Hartling J, Kim J (2008) Mutational robustness and geometrical form in protein structures. J Exp Zool B 310(3):216–226

    Article  Google Scholar 

  • Jerrum M, Sinclair A (1996) The markov chain monte carlo method: an approach to approximate counting and integration. PWS Publishing, Boston

    Google Scholar 

  • Jiang RJ (2019) Exhaustive mapping of missense variation in coronary heart disease-related genes. Ph.D. thesis, University of Toronto, Canada

  • Kitzman JO, Starita LM, Lo RS, Fields S, Shendure J (2015) Massively parallel single-amino-acid mutagenesis. Nat Methods 12(3):203–206

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22(10):3986–3997

    Article  CAS  Google Scholar 

  • Li H, Helling R, Tang C, Wingreen N (1996) Emergence of preferred structures in a simple model of protein folding. Science 273(5275):666

    Article  CAS  PubMed  Google Scholar 

  • Lind PA, Arvidsson L, Berg OG, Andersson DI (2016) Variation in mutational robustness between different proteins and the predictability of fitness effects. Mol Biol Evol 34(2):408–418

    Google Scholar 

  • Lipman DJ, Wilbur WJ (1991) Modelling neutral and selective evolution of protein folding. Proc R Soc Lond B 245(1312):7–11

    Article  CAS  Google Scholar 

  • Lynch M, Conery JS (2003) The origins of genome complexity. Science 302(5649):1401–1404

    Article  CAS  PubMed  Google Scholar 

  • Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA, Gray VE, Kircher M, Khechaduri A, Dines JN, Hause RJ et al (2018) Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet 50(6):874–882

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Maynard-Smith J (1970) Natural selection and the concept of a protein space. Nature 225(5232):563–564

    Article  Google Scholar 

  • McLaughlin RN, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R (2012) The spatial architecture of protein function and adaptation. Nature 491(7422):138–142

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Melamed D, Young DL, Gamble CE, Miller CR, Fields S (2013) Deep mutational scanning of an rrm domain of the saccharomyces cerevisiae poly (a)-binding protein. RNA 19(11):1537–1551

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS (2014) Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res 42(14):e112–e112

    Article  PubMed  PubMed Central  Google Scholar 

  • Miller J.H (1979) Genetic studies of the lac repressor: Xi. on aspects of lac repressor structure suggested by genetic experiments. J Mol Biol 131(2):249–258

    Article  CAS  PubMed  Google Scholar 

  • Mishra P, Flynn JM, Starr TN, Bolon DN (2016) Systematic mutant analyses elucidate general and client-specific aspects of hsp90 function. Cell Rep 15(3):588–598

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18(3):534–552

    Article  CAS  Google Scholar 

  • Nei M (2013) Mutation-driven evolution. OUP, Oxford

    Google Scholar 

  • Olson CA, Wu NC, Sun R (2014) A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr Biol 24(22):2643–2651

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Phillips R, Kondev J, Theriot J, Garcia H (2012) Physical biology of the cell. Garland Science, New York

    Book  Google Scholar 

  • Rao S.P, Carlstrom D.E, Miller W.G (1974) Collapsed structure polymers. Scattergun approach to amino acid copolymers. Biochemistry 13(5):943–952

    Article  CAS  PubMed  Google Scholar 

  • Redler RL, Das J, Diaz JR, Dokholyan NV (2016) Protein destabilization as a common factor in diverse inherited disorders. J Mol Evol 82(1):11–16

    Article  CAS  PubMed  Google Scholar 

  • Reidys C, Stadler PF, Schuster P (1997) Generic properties of combinatory maps: neutral networks of rna secondary structures. Bull Math Biol 59(2):339–397

    Article  CAS  PubMed  Google Scholar 

  • Rennell D, Bouvier SE, Hardy LW, Poteete AR (1991) Systematic mutation of bacteriophage t4 lysozyme. J Mol Biol 222(1):67–88

    Article  CAS  PubMed  Google Scholar 

  • Robertson AD, Murphy KP (1997) Protein structure and the energetics of protein stability. Chem Rev 97(5):1251–1268

    Article  CAS  PubMed  Google Scholar 

  • Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A et al (2017) Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357(6347):168–175

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Roscoe BP, Thayer KM, Zeldovich KB, Fushman D, Bolon DN (2013) Analyses of the effects of all ubiquitin point mutants on yeast growth rate. J Mol Biol 425(8):1363–1377

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shendure J, Fields S (2016) Massively parallel genetics. Genetics 203(2):617–619

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P (2002) Prosite: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3):265–274

    Article  CAS  PubMed  Google Scholar 

  • Starita LM, Pruneda JN, Lo RS, Fowler DM, Kim HJ, Hiatt JB, Shendure J, Brzovic PS, Fields S, Klevit RE (2013) Activity-enhancing mutations in an e3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci 110(14):E1263–E1272

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Stiffler MA, Hekstra DR, Ranganathan R (2015) Evolvability as a function of purifying selection in tem-1 \(\beta\)-lactamase. Cell 160(5):882–892

    Article  CAS  PubMed  Google Scholar 

  • Suckow J, Markiewicz P, Kleina LG, Miller J, Kisters-Woike B, Müller-Hill B (1996) Genetic studies of the lac repressor xv: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure. J Mol Biol 261(4):509–523

    Article  CAS  PubMed  Google Scholar 

  • Sun S, Weile J, Verby M, Wu Y, Wang Y, Cote AG, Fotiadou I, Kitaygorodsky J, Vidal M, Rine J et al (2020) A proactive genotype-to-patient-phenotype map for cystathionine beta-synthase. Genome Med 12(1):1–18

    Article  CAS  Google Scholar 

  • Tenaillon O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, Wu GC, Wielgoss S, Cruveiller S, Médigue C et al (2016) Tempo and mode of genome evolution in a 50,000-generation experiment. Nature 536(7615):165–170

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS (2007) The stability effects of protein mutations appear to be universally distributed. J Mol Biol 369(5):1318–1332

    Article  CAS  PubMed  Google Scholar 

  • Tokuriki N, Tawfik DS (2009) Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 19(5):596–604

    Article  CAS  PubMed  Google Scholar 

  • Van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci 96(17):9716–9720

    Article  PubMed  PubMed Central  Google Scholar 

  • Weile J, Sun S, Cote AG, Knapp J, Verby M, Mellor JC, Wu Y, Pons C, Wong C, van Lieshout N et al (2017) A framework for exhaustively mapping functional missense variants. Mol Syst Biol 13(12):957

    Article  PubMed  PubMed Central  Google Scholar 

  • Xia Y, Levitt M (2002) Roles of mutation and recombination in the evolution of protein thermodynamics. Proc Natl Acad Sci 99(16):10382–10387

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yue P, Li Z, Moult J (2005) Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 353(2):459–473

    Article  CAS  PubMed  Google Scholar 

  • Zhang J, Yang JR (2015) Determinants of the rate of protein sequence evolution. Nat Rev Genet 16(7):409–420

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The author thanks to Dr. Patricio Orio at the Centro Interdisciplinario de Neurociencias de Valparíso (CINV), for access to computational resources; and to the Chilean National Agency for Research and Development (ANID), for support through the project REDES190089.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evandro Ferrada.

Additional information

Handling Editor: Erich Bornberg-Bauer.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 261 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferrada, E. Quantifying the Mutational Robustness of Protein-Coding Genes. J Mol Evol 89, 357–369 (2021). https://doi.org/10.1007/s00239-021-10009-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-021-10009-1

Keywords

Navigation