Abstract
Heterotachy—the change in sequence evolutionary rate over time—is a common feature of protein molecular evolution. Decades of studies have shed light on the conditions under which heterotachy occurs, and there is evidence that site-specific evolutionary rate shifts are correlated with changes in protein function. Here, we present a large-scale, computational analysis using thousands of protein sequence alignments from animal and plant proteomes, representing genes related either by orthology (speciation events) or paralogy (gene duplication), to compare sequence divergence patterns in orthologous vs. paralogous sequence alignments. We use sequence-based phylogenetic analyses to infer overall sequence divergence (tree length/number of sequences) and to fit site-specific rates to a discrete gamma distribution with a shape parameter α. This inference method is applied to real protein sequence alignments, as well as alignments simulated under various models of protein sequence evolution. Our simulations indicate that sequence divergence and the α parameter are positively correlated when sequences evolve with heterotachy, meaning that inferred site rate distributions appear more uniform as sequences diverge. Divergence and α are also positively correlated in both orthologous and paralogous genes, but the average increase in α (as a function of divergence) is significantly higher in paralogous protein alignments than in orthologous alignments. This result is consistent with the widely held view that recently duplicated proteins initially evolve under relaxed selective pressure, promoting functional divergence by accumulation of amino acid replacements, and hence experience more evolutionary rate fluctuations than orthologous proteins. We discuss these findings in the context of the ortholog conjecture, a long-standing assumption in molecular evolution, which posits that protein sequences related by orthology tend to be more functionally conserved than paralogous proteins.
Similar content being viewed by others
References
Abhiman S, Daub CO, Sonnhammer ELL (2006) Prediction of function divergence in protein families using the substitution rate variation parameter alpha. Mol Biol Evol 23:1406–1413
Ahrens J, Rahaman J, Siltberg-Liberles J (2018) Large-scale analyses of site-specific evolutionary rates across eukaryote proteomes reveal confounding interactions between intrinsic disorder, secondary structure, and functional domains. Genes (Basel) 9:553
Ahrens J, Dos Santos HG, Siltberg-Liberles J (2016) The nuanced interplay of intrinsic disorder and other structural properties driving protein evolution. Mol Biol Evol 33:2248–2256
Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C (2012) Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. Eisen JA, editor. PLoS Comput Biol. 8:e1002514
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Arenas M (2015) Trends in substitution models of molecular evolution. Front Genet 6:319
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:D26–31
Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA (2012) Epistasis as the primary factor in molecular evolution. Nature 490:535–538
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973
Chen X, Zhang J (2012) The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. Ouzounis CA, editor. PLoS Comput. Biol. 8:e1002784
Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165
Dos Santos HG, Nunez-Castilla J, Siltberg-Liberles J (2016) Functional diversification after gene duplication: paralog specific regions of structural disorder and phosphorylation in p53, p63, and p73. Roemer K, editor. PLoS One 11:e0151961
Dos Santos HG, Siltberg-Liberles J (2016) Paralog-specific patterns of structural disorder and phosphorylation in the vertebrate SH3–SH2–tyrosine kinase protein family. Genome Biol Evol 8:2806–2825
Dunn CW, Zapata F, Munro C, Siebert S, Hejnol A (2018) Pairwise comparisons across species are problematic when analyzing functional genomic data. Proc Natl Acad Sci U S A 115:E409–E417
Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240
Fitch WM (1971) The nonidentity of invariable positions in the cytochromes c of different species. Biochem Genet 5:231–241
Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:579–593
Galtier N (2001) Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol 18:866–873
Gao L, Zhang J (2003) Why are some human disease-associated mutations fixed in mice? Trends Genet 19:678–681
Gaucher EA, Gu X, Miyamoto MM, Benner SA (2002) Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci 27:315–332
Gaucher EA, Miyamoto MM, Benner SA (2001) Function-structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. Proc Natl Acad Sci 98:548–552
Goldstein RA, Pollock DD (2016) The tangled bank of amino acids. Protein Sci 25:1354–1362
Gribaldo S, Casane D, Lopez P, Philippe H (2003) Functional divergence prediction from evolutionary analysis: a case study of vertebrate hemoglobin. Mol Biol Evol 20:1754–1759
Gu X (1999) Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16:1664–1674
Gu X (2003) Functional divergence in protein (family) sequence evolution. Genetica 118:133–141
Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y (2013) An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 30:1713–1719
Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site- specific residue frequencies. Mol Biol Evol 15:910–917
Hughes T, Liberles DA (2008) Whole-genome duplications in the ancestral vertebrate are detectable in the distribution of gene family sizes of tetrapod species. J Mol Evol 67:343–357
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780
Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338
Kosiol C, Bofkin L, Whelan S (2006) Phylogenetics by likelihood: evolutionary modeling as a tool for understanding the genome. J Biomed Inform 39:51–61
Long JA. 2020. jtools: Analysis and Presentation of Social Scientific Data. R package version 2.1.0.
Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19:1–7
Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155
Nehrt NL, Clark WT, Radivojac P, Hahn MW (2011) Testing the ortholog conjecture with comparative functional genomic data from mammals. Rzhetsky A, editor. PLoS Comput. Biol. 7:e1002073
Philippe H, Casane D, Gribaldo S, Lopez P, Meunier J (2003) Heterotachy and functional shift in protein evolution. IUBMB Life 55:257–265
Pollock DD, Pollard ST, Shortt JA, Goldstein RA. 2017. Mechanistic models of protein evolution. In: Pontarotti P (eds) Evolutionary biology: self/nonself evolution, species and complex traits evolution, methods and concepts. Springer, Cham. https://doi.org/10.1007/978-3-319-61569-1_15
Pollock DD, Thiltgen G, Goldstein RA (2012) Amino acid coevolution induces an evolutionary Stokes shift. Proc Natl Acad Sci 109:E1352–E1359
R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Rastogi S, Liberles DA (2005) Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5:28
Rogozin IB, Managadze D, Shabalina SA, Koonin EV (2014) Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture. Genome Biol Evol 6:754–762
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542
RStudio Team (2015) RStudio: integrated development for R. RStudio Inc, Boston, MA
Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5–15
Siltberg J, Liberles DA (2002) A simple covarion-based approach to analyse nucleotide substitution rates. J Evol Biol 15:588–594
Spielman SJ, Wilke CO (2015) Pyvolve: a flexible python module for simulating sequences along phylogenies. Robinson-Rechavi M, editor. PLoS One 10:e0139047
Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet 25:210–216
Studer RA, Robinson-Rechavi M (2010) Large-scale analysis of orthologs and paralogs under covarion-like and constant-but-different models of amino acid evolution. Mol Biol Evol 27:2618–2627
Sukumaran J, Holder MT (2010) DendroPy: a Python library for phylogenetic computing. Bioinformatics 26:1569–1571
Teufel AI, Masel J, Liberles DA (2015) What fraction of duplicates observed in recently sequenced genomes is segregating and destined to fail to fix? Genome Biol Evol 7(8):2258–2264
Teufel AI, Liu L, Liberles DA (2016) Models for gene duplication when dosage balance works as a transition state to subsequent neo- or sub-functionalization. BMC Evol Biol 16:45
Tuffley C, Steel M (1998) Modeling the covarion hypothesis of nucleotide substitution. Math Biosci 147:63–91
Wagner A (1998) The fate of duplicated genes: loss or new function? BioEssays 20:785–788
Wickham H (2009) GGplot2: elegant graphics for data analysis. Springer, New York
Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11:367–372
Yang Z, Kumar S (1996) Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. Mol Biol Evol 13:650–659
Zhou Y, Brinkmann H, Rodrigue N, Lartillot N, Philippe H (2010) A dirichlet process covarion mixture model and its assessments using posterior predictive discrepancy tests. Mol Biol Evol 27:371–384
Acknowledgements
The authors would like to thank Dr. Claus O. Wilke and Dr. Stephanie J. Spielman for their support and assistance during the protein sequence simulation component of this study. We would also like to acknowledge the Instructional & Research Computing Center (IRCC) at Florida International University for providing HPC computing resources that have contributed to the research results reported within this paper, web: https://ircc.fiu.edu. This work was supported by a Doctoral Evidence Acquisition Fellowship (Summer 2018) and a Dissertation Year Fellowship (Fall 2018 through Spring 2019) from Florida International University.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Handling Editor: Jason de Koning.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ahrens, J.B., Teufel, A.I. & Siltberg-Liberles, J. A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs. J Mol Evol 88, 720–730 (2020). https://doi.org/10.1007/s00239-020-09969-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-020-09969-7