Skip to main content
Log in

The Ancient Operational Code is Embedded in the Amino Acid Substitution Matrix and aaRS Phylogenies

  • Original Article
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes’ capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We use“differentiable” rather than “distinguishable” when we want to emphasise the means of making a distinction, rather than simply the existence of a distinction, especially among amino acids.

References

  • Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102:6395–6400

    CAS  PubMed  Google Scholar 

  • Bashford J, Tsohantjis I, Jarvis P (1998) A supersymmetric model for the evolution of the genetic code. Proc Natl Acad Sci USA 95:987–992

    CAS  PubMed  Google Scholar 

  • Bernhardt HS, Tate WP (2008) Evidence from glycine transfer RNA of a frozen accident at the dawn of the genetic code. Biol Direct 3(1):53

    PubMed  PubMed Central  Google Scholar 

  • Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N et al (2019) BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 15(4):e1006650

    CAS  PubMed  PubMed Central  Google Scholar 

  • Caetano-Anollés G, Wang M, Caetano-Anollés D (2013) Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS ONE 8(8):e72225

    PubMed  PubMed Central  Google Scholar 

  • Carter CW (2015) What RNA world? Why a peptide/RNA partnership merits renewed experimental attention. Life 5:294–320

    CAS  PubMed  PubMed Central  Google Scholar 

  • Carter CW, Wills PR (2017) Interdependence, reflexivity, fidelity, impedance matching, and the evolution of genetic coding. Mol Biol Evol 35(2):269–286

    PubMed Central  Google Scholar 

  • Carter CW, Wills PR (2018) Hierarchical groove discrimination by Class I and II aminoacyl-tRNA synthetases reveals a palimpsest of the operational rna code in the tRNA acceptor-stem bases. Nucleic Acids Res 46(18):9667–9683

    CAS  PubMed  PubMed Central  Google Scholar 

  • Carter CW, Wolfenden R (2015) tRNA acceptor stem and anticodon bases form independent codes related to protein folding. Proc Natl Acad Sci USA 112(24):7489–7494

    CAS  PubMed  Google Scholar 

  • Dayhoff M, Schwartz R, Orcutt B (1978) A model of evolutionary change in proteins. In: Atlas of protein sequence and structure. National Biomedical Research Foundation Silver Spring, MD, pp 345–352

  • Delarue M (2007) An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13(2):161–169

    CAS  PubMed  PubMed Central  Google Scholar 

  • Di Giulio M (1995) Was it an ancient gene codifying for a hairpin RNA that, by means of direct duplication, gave rise to the primitive tRNA mmolecule? J Theor Biol 177:95–101

    PubMed  Google Scholar 

  • Di Giulio M (2001) The origin of the genetic code cannot be studied using measurements based on the PAM matrix because this matrix reflects the code itself, making any such analyses tautologous. J Theor Biol 208(2):141–144

    PubMed  Google Scholar 

  • Di Giulio M (2004) The origin of the tRNA molecule: implications for the origin of protein synthesis. J Theor Biol 226:89–93

    PubMed  Google Scholar 

  • Draghi J, Wagner GP (2007) Evolution of evolvability in a developmental model. Evolution 62:301–315

    PubMed  Google Scholar 

  • Draghi J, Wagner GP (2009) The evolutionary dynamics of evolvability in a gene network model. J Evol Biol 22:599–611

    CAS  PubMed  Google Scholar 

  • Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58(10):465–523

    CAS  PubMed  Google Scholar 

  • Facchiano A, Di Giulio M (2018) The genetic code is not an optimal code in a model taking into account both the biosynthetic relationships between amino acids and their physicochemical properties. J Theor Biol 459:45–51

    CAS  PubMed  Google Scholar 

  • Fournier G, Alm E (2015) Ancestral reconstruction of a pre-LUCA aminoacyl-tRNA synthetase ancestor supports the late addition of Trp to the genetic code. J Mol Evol 80(3–4):171–185

    CAS  PubMed  Google Scholar 

  • Füchslin RM, McCaskill JS (2001) Evolutionary self-organization of cell-free genetic coding. Proc Natl Acad Sci USA 98(16):9185–9190

    PubMed  Google Scholar 

  • Goslee SC, Urban DL (2007) The ecodist package for dissimilarity-based analysis of ecological data. J Stat Softw 22:1–19

    Google Scholar 

  • Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864

    CAS  PubMed  Google Scholar 

  • Haig D, Hurst LD (1991) A quantitative measure of error minimization in the genetic code. J Mol Evol 33(5):412–417

    CAS  PubMed  Google Scholar 

  • Hamilton WD (1996) Narrow roads of gene land: the collected papers of W. D. Hamilton volume 1: evolution of social behaviour. Oxford University Press, Oxford

    Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919

    CAS  PubMed  Google Scholar 

  • Hornos JEM, Hornos YM (1993) Algebraic model for the evolution of the genetic code. Phys Rev Lett 71(26):4401

    CAS  PubMed  Google Scholar 

  • Ikehara K (2005) Possible steps to the emergence of life: the GADV-protein world hypothesis. Chem Rec 5(2):107–118

    CAS  PubMed  Google Scholar 

  • Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8(3):275–282

    CAS  Google Scholar 

  • Kaiser F, Bittrich S, Salentin S, Leberecht C, Haupt VJ, Krautwurst S, Schroeder M, Labudde D (2018) Backbone brackets and arginine tweezers delineate Class I and Class II aminoacyl tRNA synthetases. PLoS Comput Biol 14(4):e1006101

    PubMed  PubMed Central  Google Scholar 

  • Koonin EV, Novozhilov AS (2009) Origin and evolution of the genetic code: the universal enigma. IUBMB Life 61(2):99–111

    CAS  PubMed  PubMed Central  Google Scholar 

  • Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320

    CAS  PubMed  Google Scholar 

  • Li L, Francklyn C, Carter CW (2013) Aminoacylating urzymes challenge the RNA world hypothesis. J Biol Chem 288(37):26856–26863

    CAS  PubMed  PubMed Central  Google Scholar 

  • Niefind K, Schomburg D (1991) Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding angles. J Mol Biol 219:481–497

    CAS  PubMed  Google Scholar 

  • O’Donoghue P, Luthey-Schulten Z (2003) On the evolution of structure in aminoacyl-tRNA synthetases. Microbiol Mol Biol Rev 67(4):550–573

    PubMed  PubMed Central  Google Scholar 

  • Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A (2007) Ideal amino acid exchange forms for approximating substitution matrices. Proteins: Struct Funct Bioinform 69(2):379–393

    CAS  Google Scholar 

  • Popinga A, Carter CW, Bouckaert R, Wills PR (2019) Structure-informed phylogenetic analysis of the aminoacyl-tRNA synthetases. In: Popinga, A.: From the origins of life to epidemics: Bayesian inference, simulation, and dynamics of bioinformatic systems. PhD Thesis, Computer Science, University of Auckland: Supplementary Data (2019). http://github.com/alexpopinga/aaRS-Pipeline. Accessed 11 Apr

  • R Core Team: R (2013) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/

  • Richards FM (1977) Areas, volumes, packing, and protein structure. Ann Rev Biophys Bioeng 6(1):151–176

    CAS  Google Scholar 

  • Rodin SN, Ohno S (1995) Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of nucleic acids. Orig Life Evol Biosph 25:565–589

    CAS  PubMed  Google Scholar 

  • Schimmel P, Giege R, Moras D, Yokoyama S (1993) An operational RNA code for amino acids and possible relationship to genetic code. Proc Natl Acad Sci USA 90(19):8763–8768

    CAS  PubMed  Google Scholar 

  • Smith TF, Hartman H (2015) The evolution of Class II aminoacyl-tRNA synthetases and the first code. FEBS Lett 589:3499–3507

    CAS  PubMed  Google Scholar 

  • Štambuk N, Konjevoda P, Manojlović Z (2016) Miyazawa-Jernigan contact potentials and Carter-Wolfenden vapor-to-cyclohexane and water-to-cyclohexane scales as parameters for calculating amino acid pair distances. In: International conference on bioinformatics and biomedical engineering. Springer, Berlin, pp 358–365

    Google Scholar 

  • Stephenson JD, Freeland SJ (2013) Unearthing the root of amino acid similarity. J Mol Evol 77(4):159–169

    CAS  PubMed  PubMed Central  Google Scholar 

  • Turing A (1952) The chemical basis of morphogenesis. Philos Trans R Soc B 237:37–72

    Google Scholar 

  • Unvert KE, Kovacs FA, Zhang C, Hellmann-Whitakerc RA, Arndt KN (2017) Evolution of leucyl-tRNA synthetase through eukaryotic speciation. Am J Undergrad Res 14:69–83

    Google Scholar 

  • Vetsigian K, Woese C, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Natl Acad Sci USA 103(28):10696–10701

    CAS  PubMed  Google Scholar 

  • Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699

    CAS  PubMed  Google Scholar 

  • Wills PR (1993) Self-organization of genetic coding. J Theor Biol 162(3):267–287

    CAS  PubMed  Google Scholar 

  • Wills PR (2014) Spontaneous mutual ordering of nucleic acids and proteins. Orig Life Evol Biosph 44:293–298

    CAS  PubMed  Google Scholar 

  • Wills PR, Carter CW (2018) Insuperable problems of the genetic code initially emerging in an RNA world. Biosystems 164:155–166

    CAS  PubMed  Google Scholar 

  • Wills PR, Nieselt K, McCaskill JS (2015) Emergence of coding and its specificity as a physico-informatic problem. Orig Life Evol Biosph 45(1–2):249–255

    CAS  PubMed  Google Scholar 

  • Woese CR (1965) On the evolution of the genetic code. Proc Natl Acad Sci USA 54(6):1546–1552

    CAS  PubMed  Google Scholar 

  • Wolf YI, Koonin EV (2007) On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biol Direct 2(1):14

    PubMed  PubMed Central  Google Scholar 

  • Wolf YI, Yuri I, Aravind L, Grishin NV, Koonin EV (1999) Evolution of aminoacyl-tRNA synthetases–analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res 9:689–710

    CAS  PubMed  Google Scholar 

  • Wolfenden R, Lewis CA, Yuan Y, Carter CW (2015) Temperature dependence of amino acid hydrophobicities. Proc Nat Acad Sci USA 112:7484–7488

    CAS  PubMed  Google Scholar 

  • Wong JTF (1975) A co-evolution theory of the genetic code. Proc Natl Acad Sci USA 72(5):1909

    CAS  PubMed  Google Scholar 

  • Yampolsky LY, Stoltzfus A (2005) The exchangeability of amino acids in proteins. Genetics 170:1459–1472

    CAS  PubMed  PubMed Central  Google Scholar 

  • Zimmermann K, Gibrat J (2010) Amino acid “little Big Bang”: representing amino acid substitution matrices as dot products of Euclidian vectors. BMC Bioinform 11:4

    Google Scholar 

Download references

Acknowledgements

PRW thanks the Alexander von Humboldt Foundation for its continual support; both Peter Stadler and Alexei Drummond for their encouragement, more than a decade ago, to carry out this study; Andrew Torda for advice about substitution matrices and Charlie Carter for constant helpful correspondence and discussions.

Funding

This research was supported by Australian Research Council (ARC) Discovery Grant DP150100088 to Barbara R. Holland and Jeremy G. Sumner, and Research Training Program scholarship to Julia A. Shore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julia A. Shore.

Additional information

Handling Editor: Arndt von Haeseler.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shore, J.A., Holland, B.R., Sumner, J.G. et al. The Ancient Operational Code is Embedded in the Amino Acid Substitution Matrix and aaRS Phylogenies. J Mol Evol 88, 136–150 (2020). https://doi.org/10.1007/s00239-019-09918-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-019-09918-z

Keywords

Navigation