Abstract
The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes’ capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.
Similar content being viewed by others
Notes
We use“differentiable” rather than “distinguishable” when we want to emphasise the means of making a distinction, rather than simply the existence of a distinction, especially among amino acids.
References
Atchley WR, Zhao J, Fernandes AD, Drüke T (2005) Solving the protein sequence metric problem. Proc Natl Acad Sci USA 102:6395–6400
Bashford J, Tsohantjis I, Jarvis P (1998) A supersymmetric model for the evolution of the genetic code. Proc Natl Acad Sci USA 95:987–992
Bernhardt HS, Tate WP (2008) Evidence from glycine transfer RNA of a frozen accident at the dawn of the genetic code. Biol Direct 3(1):53
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N et al (2019) BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 15(4):e1006650
Caetano-Anollés G, Wang M, Caetano-Anollés D (2013) Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS ONE 8(8):e72225
Carter CW (2015) What RNA world? Why a peptide/RNA partnership merits renewed experimental attention. Life 5:294–320
Carter CW, Wills PR (2017) Interdependence, reflexivity, fidelity, impedance matching, and the evolution of genetic coding. Mol Biol Evol 35(2):269–286
Carter CW, Wills PR (2018) Hierarchical groove discrimination by Class I and II aminoacyl-tRNA synthetases reveals a palimpsest of the operational rna code in the tRNA acceptor-stem bases. Nucleic Acids Res 46(18):9667–9683
Carter CW, Wolfenden R (2015) tRNA acceptor stem and anticodon bases form independent codes related to protein folding. Proc Natl Acad Sci USA 112(24):7489–7494
Dayhoff M, Schwartz R, Orcutt B (1978) A model of evolutionary change in proteins. In: Atlas of protein sequence and structure. National Biomedical Research Foundation Silver Spring, MD, pp 345–352
Delarue M (2007) An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13(2):161–169
Di Giulio M (1995) Was it an ancient gene codifying for a hairpin RNA that, by means of direct duplication, gave rise to the primitive tRNA mmolecule? J Theor Biol 177:95–101
Di Giulio M (2001) The origin of the genetic code cannot be studied using measurements based on the PAM matrix because this matrix reflects the code itself, making any such analyses tautologous. J Theor Biol 208(2):141–144
Di Giulio M (2004) The origin of the tRNA molecule: implications for the origin of protein synthesis. J Theor Biol 226:89–93
Draghi J, Wagner GP (2007) Evolution of evolvability in a developmental model. Evolution 62:301–315
Draghi J, Wagner GP (2009) The evolutionary dynamics of evolvability in a gene network model. J Evol Biol 22:599–611
Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58(10):465–523
Facchiano A, Di Giulio M (2018) The genetic code is not an optimal code in a model taking into account both the biosynthetic relationships between amino acids and their physicochemical properties. J Theor Biol 459:45–51
Fournier G, Alm E (2015) Ancestral reconstruction of a pre-LUCA aminoacyl-tRNA synthetase ancestor supports the late addition of Trp to the genetic code. J Mol Evol 80(3–4):171–185
Füchslin RM, McCaskill JS (2001) Evolutionary self-organization of cell-free genetic coding. Proc Natl Acad Sci USA 98(16):9185–9190
Goslee SC, Urban DL (2007) The ecodist package for dissimilarity-based analysis of ecological data. J Stat Softw 22:1–19
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864
Haig D, Hurst LD (1991) A quantitative measure of error minimization in the genetic code. J Mol Evol 33(5):412–417
Hamilton WD (1996) Narrow roads of gene land: the collected papers of W. D. Hamilton volume 1: evolution of social behaviour. Oxford University Press, Oxford
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919
Hornos JEM, Hornos YM (1993) Algebraic model for the evolution of the genetic code. Phys Rev Lett 71(26):4401
Ikehara K (2005) Possible steps to the emergence of life: the GADV-protein world hypothesis. Chem Rec 5(2):107–118
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8(3):275–282
Kaiser F, Bittrich S, Salentin S, Leberecht C, Haupt VJ, Krautwurst S, Schroeder M, Labudde D (2018) Backbone brackets and arginine tweezers delineate Class I and Class II aminoacyl tRNA synthetases. PLoS Comput Biol 14(4):e1006101
Koonin EV, Novozhilov AS (2009) Origin and evolution of the genetic code: the universal enigma. IUBMB Life 61(2):99–111
Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320
Li L, Francklyn C, Carter CW (2013) Aminoacylating urzymes challenge the RNA world hypothesis. J Biol Chem 288(37):26856–26863
Niefind K, Schomburg D (1991) Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding angles. J Mol Biol 219:481–497
O’Donoghue P, Luthey-Schulten Z (2003) On the evolution of structure in aminoacyl-tRNA synthetases. Microbiol Mol Biol Rev 67(4):550–573
Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A (2007) Ideal amino acid exchange forms for approximating substitution matrices. Proteins: Struct Funct Bioinform 69(2):379–393
Popinga A, Carter CW, Bouckaert R, Wills PR (2019) Structure-informed phylogenetic analysis of the aminoacyl-tRNA synthetases. In: Popinga, A.: From the origins of life to epidemics: Bayesian inference, simulation, and dynamics of bioinformatic systems. PhD Thesis, Computer Science, University of Auckland: Supplementary Data (2019). http://github.com/alexpopinga/aaRS-Pipeline. Accessed 11 Apr
R Core Team: R (2013) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Richards FM (1977) Areas, volumes, packing, and protein structure. Ann Rev Biophys Bioeng 6(1):151–176
Rodin SN, Ohno S (1995) Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of nucleic acids. Orig Life Evol Biosph 25:565–589
Schimmel P, Giege R, Moras D, Yokoyama S (1993) An operational RNA code for amino acids and possible relationship to genetic code. Proc Natl Acad Sci USA 90(19):8763–8768
Smith TF, Hartman H (2015) The evolution of Class II aminoacyl-tRNA synthetases and the first code. FEBS Lett 589:3499–3507
Štambuk N, Konjevoda P, Manojlović Z (2016) Miyazawa-Jernigan contact potentials and Carter-Wolfenden vapor-to-cyclohexane and water-to-cyclohexane scales as parameters for calculating amino acid pair distances. In: International conference on bioinformatics and biomedical engineering. Springer, Berlin, pp 358–365
Stephenson JD, Freeland SJ (2013) Unearthing the root of amino acid similarity. J Mol Evol 77(4):159–169
Turing A (1952) The chemical basis of morphogenesis. Philos Trans R Soc B 237:37–72
Unvert KE, Kovacs FA, Zhang C, Hellmann-Whitakerc RA, Arndt KN (2017) Evolution of leucyl-tRNA synthetase through eukaryotic speciation. Am J Undergrad Res 14:69–83
Vetsigian K, Woese C, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Natl Acad Sci USA 103(28):10696–10701
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699
Wills PR (1993) Self-organization of genetic coding. J Theor Biol 162(3):267–287
Wills PR (2014) Spontaneous mutual ordering of nucleic acids and proteins. Orig Life Evol Biosph 44:293–298
Wills PR, Carter CW (2018) Insuperable problems of the genetic code initially emerging in an RNA world. Biosystems 164:155–166
Wills PR, Nieselt K, McCaskill JS (2015) Emergence of coding and its specificity as a physico-informatic problem. Orig Life Evol Biosph 45(1–2):249–255
Woese CR (1965) On the evolution of the genetic code. Proc Natl Acad Sci USA 54(6):1546–1552
Wolf YI, Koonin EV (2007) On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biol Direct 2(1):14
Wolf YI, Yuri I, Aravind L, Grishin NV, Koonin EV (1999) Evolution of aminoacyl-tRNA synthetases–analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res 9:689–710
Wolfenden R, Lewis CA, Yuan Y, Carter CW (2015) Temperature dependence of amino acid hydrophobicities. Proc Nat Acad Sci USA 112:7484–7488
Wong JTF (1975) A co-evolution theory of the genetic code. Proc Natl Acad Sci USA 72(5):1909
Yampolsky LY, Stoltzfus A (2005) The exchangeability of amino acids in proteins. Genetics 170:1459–1472
Zimmermann K, Gibrat J (2010) Amino acid “little Big Bang”: representing amino acid substitution matrices as dot products of Euclidian vectors. BMC Bioinform 11:4
Acknowledgements
PRW thanks the Alexander von Humboldt Foundation for its continual support; both Peter Stadler and Alexei Drummond for their encouragement, more than a decade ago, to carry out this study; Andrew Torda for advice about substitution matrices and Charlie Carter for constant helpful correspondence and discussions.
Funding
This research was supported by Australian Research Council (ARC) Discovery Grant DP150100088 to Barbara R. Holland and Jeremy G. Sumner, and Research Training Program scholarship to Julia A. Shore.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Arndt von Haeseler.
Rights and permissions
About this article
Cite this article
Shore, J.A., Holland, B.R., Sumner, J.G. et al. The Ancient Operational Code is Embedded in the Amino Acid Substitution Matrix and aaRS Phylogenies. J Mol Evol 88, 136–150 (2020). https://doi.org/10.1007/s00239-019-09918-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-019-09918-z