Abstract
Proteins can evolve by accumulating changes on amino acid sequences. These changes are mainly caused by missense mutations on its DNA coding sequences. Mutations with neutral or positive effects on fitness can be maintained while deleterious mutations tend to be eliminated by natural selection. Amino acid changes are influenced by the biophysical, chemical, and biological properties of amino acids. There is a multiplicity of amino acid properties that can influence the function and expression of proteins. Amino acid properties can be expressed into numerical indexes, which can help to predict functional and structural aspects of proteins and allow statistical inferences of selection pressure on amino acid usage. The accuracy of these analyses may be compromised by the existence of several numerical indexes that measure the same amino acid property, and the lack of objective parameters to determine the most accurate and biologically relevant index. In the present study, the gradient consistency test was used in order to estimate the magnitude of directional selection imparted by amino acid biochemical and biophysical properties on protein evolution.
Similar content being viewed by others
References
Abriata LA, Palzkill T, Dal Peraro M (2015) How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS ONE 10(2):e0118684. https://doi.org/10.1371/journal.pone.0118684
Bergman J, Eyre-Walker A (2019) Does adaptive protein evolution proceed by large or small steps at the amino acid level? Mol Biol Evol 36(5):990–998. https://doi.org/10.1093/molbev/msz033
Bosshard L, Peischl S, Ackermann M, Excoffier L (2019) Mutational and selective processes involved in evolution during bacterial range expansions. Mol Biol Evol 36(10):2313–2327. https://doi.org/10.1093/molbev/msz148
Brown CJ, Johnson AK, Dunker AK, Daughdrill GW (2011) Evolution and disorder. Curr Opin Struct Biol 21(3):441–446. https://doi.org/10.1016/j.sbi.2011.02.005
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7(10):e46688. https://doi.org/10.1371/journal.pone.0046688
Croux C, Dehon C (2010) Influence functions of the Spearman and Kendall correlation measures. Stat Methods Appl 19:497–515. https://doi.org/10.1007/s10260-010-0142-z
DePristo MA, Weinreich DM, Hartl DL (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6(9):678–687. https://doi.org/10.1038/nrg1672
Duan MJ, Zhou YH (2005) A contact energy function considering residue hydrophobic environment and its application in protein fold recognition. Genomics Proteomics Bioinformatics 3(4):218–224. https://doi.org/10.1016/s1672-0229(05)03030-5
Dyson HJ, Wright PE, Scheraga HA (2006) The role of hydrophobic interactions in initiation and propagation of protein folding. Proc Natl Acad Sci USA 103(35):13057–13061. https://doi.org/10.1073/pnas.0605504103
Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46:85–103. https://doi.org/10.1146/annurev-biophys-070816-033819
Forcelloni S, Giansanti A (2020) Evolutionary forces and codon bias in different flavors of intrinsic disorder in the human proteome. J Mol Evol 88(2):164–178. https://doi.org/10.1007/s00239-019-09921-4
Gillooly JF, McCoy MW, Allen AP (2007) Effects of metabolic rate on protein evolution. Biol Lett. https://doi.org/10.1098/rsbl.2007.0403
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849. https://doi.org/10.1093/bioinformatics/btx218
Hurst LD, Feil EJ, Rocha EP (2006) Protein evolution: causes of trends in amino-acid gain and loss. Nature 442(7105):E11–2; discussion E12. https://doi.org/10.1038/nature05137.
Johnson VE (2003) Revised standards for statistical evidence. Proc Natl Acad Sci USA 110:19313–19317. https://doi.org/10.1073/pnas.1313476110
Jordan IK, Kondrashov FA, Adzhubei IA, Wolf YI, Koonin EV, Kondrashov AS, Sunyaev S (2005) A universal trend of amino acid gain and loss in protein evolution. Nature 433(7026):633–638. https://doi.org/10.1038/nature03306
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36(Database issue):D202–5. https://doi.org/10.1093/nar/gkm998.
Korber B, Fischer WM, Gnanakaran S et al (2020) Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 182(4):812-827.e19. https://doi.org/10.1016/j.cell.2020.06.043
Lande R, Arnold SJ (1983) The measurement of selection on correlated characters. Evolution 1:109–124. https://doi.org/10.1111/j.1558-5646.1983.tb00236.x
Line SRP, Souza AP, Mofatto LS (2014) Large scale statistical analysis of genome data with Ruby and R: skipping interface libraries. EMBNet J. https://doi.org/https://doi.org/10.14806/ej.20.0.753
Ma J, Wang S (2015) AcconPred: predicting solvent accessibility and contact number simultaneously by a multitask learning framework under the conditional neural fields model. Biomed Res Int 2015:678764. https://doi.org/10.1155/2015/678764
McDonald JH (2006) Apparent trends of amino acid gain and loss in protein evolution due to nearly neutral variation. Mol Biol Evol 23(2):240–244. https://doi.org/10.1093/molbev/msj026
Martin OA, Vila JA (2020) The marginal stability of proteins: How the jiggling and wiggling of atoms is connected to neutral evolution. J Mol Evol 88(5):424–426. https://doi.org/10.1007/s00239-020-09940-6
Naderi-Manesh H, Sadeghi M, Arab S, Moosavi Movahedi AA (2001) Prediction of protein surface accessibility with information theory. J Proteins 42:452–459. https://doi.org/10.1002/1097-0134(20010301)42:4%3c452::aid-prot40%3e3.0.co;2-q
Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814. https://doi.org/10.1093/nar/gkg509
Nishikawa K, Ooi T (1980) Prediction of the surface-interior diagram of globular proteins by an empirical method. Int J Peptide Protein Res 16:19–32. https://doi.org/10.1111/j.1399-3011.1980.tb02931.x
Oobatake M, Ooi T (1977) An analysis of non-bonded energy of proteins. J Theor Biol 67:567–584. https://doi.org/10.1016/0022-5193(77)90058-3
Perunov N, England JL (2014) Quantitative theory of hydrophobic effect as a driving force of protein structure. Protein Sci 23(4):387–399. https://doi.org/10.1002/pro.2420
Pollastri G, Baldi P, Fariselli P, Casadio R (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47:142–153. https://doi.org/10.1002/prot.10069
Pokarowski P, Kloczkowski A, Jernigan RL, Kothari NS, Pokarowska M, Kolinski A (2005) Inferring ideal amino acid interaction forms from statistical protein contact potentials. Proteins 59:49–57. https://doi.org/10.1002/prot.20380
Prasad VK, Otero-de-la-Roza A, DiLabio GA (2019) PEPCONF, a diverse data set of peptide conformational energies. Sci Data 6:180310. https://doi.org/10.1038/sdata.2018.310.
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.
Rackovsky S, Scheraga HAT (1977) Hydrophobicity, hydrophilicity, and the radial and orientational distributions of residues in native proteins. Proc Natl Acad Sci USA 74:5248–5251. https://doi.org/10.1073/pnas.74.12.5248
Raiford DW, Heizer EM Jr, Miller RV, Akashi H, Raymer ML, Krane DE (2008) Do amino acid biosynthetic costs constrain protein evolution in Saccharomyces cerevisiae? J Mol Evol 67(6):621–630. https://doi.org/10.1007/s00239-008-9162-9
Rose GD, Roy S (1980) Hydrophobic basis of packing in globular proteins. Proc Natl Acad Sci USA 77(8):4643–4647. https://doi.org/10.1073/pnas.77.8.4643
Rudnicki WR, Mroczek T, Cudek P (2014) Amino acid properties conserved in molecular evolution. PLoS ONE 9(6):e98983. https://doi.org/10.1371/journal.pone.0098983
Sarda D, Chua GH, Li K-B, Krishnan A (2005) pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. BMC Bioinformatics 6:152. https://doi.org/10.1186/1471-2105-6-152
Shahmoradi A, Wilke CO (2016) Dissecting the roles of local packing density and longer-range effects in protein sequence evolution. Proteins 84(6):841–854. https://doi.org/10.1002/prot.25034
Simm S, Einloft J, Mirus O, Schleiff E (2016) 50 years of amino acid hydrophobicity scales: revisiting the capacity for peptide classification. Biol Res 49(1):31. https://doi.org/10.1186/s40659-016-0092-5
Storey JD, Bass AJ, Dabney A, Robinson D (2020) qvalue: Q-value estimation for false discovery rate control. R package version 2.22.0. http://github.com/jdstorey/qvalue.
Suckow J, Markiewicz P, Kleina LG, Miller J, Kisters-Woike B, Müller-Hill B (1996) Genetic studies of the Lac repressor XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure. J Mol Biol 261(4):509–523. https://doi.org/10.1006/jmbi.1996.0479
Sweet RM, Eisenberg DT (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171:479–488. https://doi.org/10.1016/0022-2836(83)90041-4
Swire J (2007) Selection on synthesis cost affects interprotein amino acid usage in all three domains of life. J Mol Evol 64:558–571. https://doi.org/10.1007/s00239-006-0206-8
Teilum K, Olsen JG, Kragelund BB (2009) Functional aspects of protein flexibility. Cell Mol Life Sci 66(14):2231–2247
Venables WN, Ripley BD (2002) Modern applied statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0. http://www.stats.ox.ac.uk/pub/MASS4/.
Vihinen M, Torkkila E, Riikonen PT (1994) Accuracy of protein flexibility predictions. J Proteins 19:141–149. https://doi.org/10.1002/prot.340190207
Wei Q, Wang L, Wang Q, Kruger WD, Dunbrack RL Jr (2010) Testing computational prediction of missense mutation phenotypes: functional characterization of 204 mutations of human cystathionine beta synthase. Proteins 78(9):2058–2074. https://doi.org/10.1002/prot.22722
White CR, Seymour RS (2003) Mammalian basal metabolic rate is proportional to body mass 2/3. Proc Natl Acad Sci USA 100(7):4046–4049. https://doi.org/10.1073/pnas.0436428100
Yan W, Sun M, Hu G, Zhou J, Zhang W, Chen J, Chen B, Shen B (2014) Amino acid contact energy networks impact protein structure and evolution. J Theor Biol 355:95–104. https://doi.org/10.1016/j.jtbi.2014.03.032
Yeh SW, Huang TT, Liu JW, Yu SH, Shih CH, Hwang JK, Echave J (2014) Local packing density is the main structural determinant of the rate of protein sequence evolution at site level. BioMed Res Int 2014:572409. https://doi.org/10.1155/2014/572409
Zhang H, Wang Y, Li J, Chen H, He X, Zhang H, Liang H, Lu J (2018) Biosynthetic energy cost for amino acids decreases in cancer evolution. Nat Commun 9(1):4124. https://doi.org/10.1038/s41467-018-06461-1
Funding
GBF is supported by a fellowship from Sao Paulo Research Foundation-FAPESP (Grant 2018/23038–7). SRPL is supported by a grant from Brazilian Council for Scientific and Technological Development-CNPq (Grant 305783/2018–1).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Handling Editor: Ashley Teufel.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fogalli, G.B., Line, S.R.P. Estimating the Influence of Physicochemical and Biochemical Property Indexes on Selection for Amino Acids Usage in Eukaryotic Cells. J Mol Evol 89, 257–268 (2021). https://doi.org/10.1007/s00239-021-10003-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-021-10003-7