Skip to main content
Log in

A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs

  • Original Article
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

Heterotachy—the change in sequence evolutionary rate over time—is a common feature of protein molecular evolution. Decades of studies have shed light on the conditions under which heterotachy occurs, and there is evidence that site-specific evolutionary rate shifts are correlated with changes in protein function. Here, we present a large-scale, computational analysis using thousands of protein sequence alignments from animal and plant proteomes, representing genes related either by orthology (speciation events) or paralogy (gene duplication), to compare sequence divergence patterns in orthologous vs. paralogous sequence alignments. We use sequence-based phylogenetic analyses to infer overall sequence divergence (tree length/number of sequences) and to fit site-specific rates to a discrete gamma distribution with a shape parameter α. This inference method is applied to real protein sequence alignments, as well as alignments simulated under various models of protein sequence evolution. Our simulations indicate that sequence divergence and the α parameter are positively correlated when sequences evolve with heterotachy, meaning that inferred site rate distributions appear more uniform as sequences diverge. Divergence and α are also positively correlated in both orthologous and paralogous genes, but the average increase in α (as a function of divergence) is significantly higher in paralogous protein alignments than in orthologous alignments. This result is consistent with the widely held view that recently duplicated proteins initially evolve under relaxed selective pressure, promoting functional divergence by accumulation of amino acid replacements, and hence experience more evolutionary rate fluctuations than orthologous proteins. We discuss these findings in the context of the ortholog conjecture, a long-standing assumption in molecular evolution, which posits that protein sequences related by orthology tend to be more functionally conserved than paralogous proteins.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abhiman S, Daub CO, Sonnhammer ELL (2006) Prediction of function divergence in protein families using the substitution rate variation parameter alpha. Mol Biol Evol 23:1406–1413

    CAS  PubMed  Google Scholar 

  • Ahrens J, Rahaman J, Siltberg-Liberles J (2018) Large-scale analyses of site-specific evolutionary rates across eukaryote proteomes reveal confounding interactions between intrinsic disorder, secondary structure, and functional domains. Genes (Basel) 9:553

    Google Scholar 

  • Ahrens J, Dos Santos HG, Siltberg-Liberles J (2016) The nuanced interplay of intrinsic disorder and other structural properties driving protein evolution. Mol Biol Evol 33:2248–2256

    CAS  PubMed  Google Scholar 

  • Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C (2012) Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. Eisen JA, editor. PLoS Comput Biol. 8:e1002514

    CAS  PubMed  PubMed Central  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    CAS  PubMed  Google Scholar 

  • Arenas M (2015) Trends in substitution models of molecular evolution. Front Genet 6:319

    PubMed  PubMed Central  Google Scholar 

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

    CAS  PubMed  PubMed Central  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:D26–31

  • Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA (2012) Epistasis as the primary factor in molecular evolution. Nature 490:535–538

    CAS  PubMed  Google Scholar 

  • Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973

    CAS  PubMed  PubMed Central  Google Scholar 

  • Chen X, Zhang J (2012) The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. Ouzounis CA, editor. PLoS Comput. Biol. 8:e1002784

    CAS  PubMed  PubMed Central  Google Scholar 

  • Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165

    CAS  PubMed  Google Scholar 

  • Dos Santos HG, Nunez-Castilla J, Siltberg-Liberles J (2016) Functional diversification after gene duplication: paralog specific regions of structural disorder and phosphorylation in p53, p63, and p73. Roemer K, editor. PLoS One 11:e0151961

    PubMed  PubMed Central  Google Scholar 

  • Dos Santos HG, Siltberg-Liberles J (2016) Paralog-specific patterns of structural disorder and phosphorylation in the vertebrate SH3–SH2–tyrosine kinase protein family. Genome Biol Evol 8:2806–2825

    PubMed  PubMed Central  Google Scholar 

  • Dunn CW, Zapata F, Munro C, Siebert S, Hejnol A (2018) Pairwise comparisons across species are problematic when analyzing functional genomic data. Proc Natl Acad Sci U S A 115:E409–E417

    CAS  PubMed  PubMed Central  Google Scholar 

  • Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240

    Google Scholar 

  • Fitch WM (1971) The nonidentity of invariable positions in the cytochromes c of different species. Biochem Genet 5:231–241

    CAS  PubMed  Google Scholar 

  • Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:579–593

    CAS  PubMed  Google Scholar 

  • Galtier N (2001) Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol 18:866–873

    CAS  PubMed  Google Scholar 

  • Gao L, Zhang J (2003) Why are some human disease-associated mutations fixed in mice? Trends Genet 19:678–681

    CAS  PubMed  Google Scholar 

  • Gaucher EA, Gu X, Miyamoto MM, Benner SA (2002) Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci 27:315–332

    CAS  PubMed  Google Scholar 

  • Gaucher EA, Miyamoto MM, Benner SA (2001) Function-structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. Proc Natl Acad Sci 98:548–552

    CAS  PubMed  PubMed Central  Google Scholar 

  • Goldstein RA, Pollock DD (2016) The tangled bank of amino acids. Protein Sci 25:1354–1362

    CAS  PubMed  PubMed Central  Google Scholar 

  • Gribaldo S, Casane D, Lopez P, Philippe H (2003) Functional divergence prediction from evolutionary analysis: a case study of vertebrate hemoglobin. Mol Biol Evol 20:1754–1759

    CAS  PubMed  Google Scholar 

  • Gu X (1999) Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16:1664–1674

    CAS  PubMed  Google Scholar 

  • Gu X (2003) Functional divergence in protein (family) sequence evolution. Genetica 118:133–141

    CAS  PubMed  Google Scholar 

  • Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y (2013) An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 30:1713–1719

    CAS  PubMed  Google Scholar 

  • Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site- specific residue frequencies. Mol Biol Evol 15:910–917

    CAS  PubMed  Google Scholar 

  • Hughes T, Liberles DA (2008) Whole-genome duplications in the ancestral vertebrate are detectable in the distribution of gene family sizes of tetrapod species. J Mol Evol 67:343–357

    CAS  PubMed  Google Scholar 

  • Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314

    Google Scholar 

  • Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282

    CAS  PubMed  Google Scholar 

  • Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    CAS  PubMed  PubMed Central  Google Scholar 

  • Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338

    CAS  PubMed  Google Scholar 

  • Kosiol C, Bofkin L, Whelan S (2006) Phylogenetics by likelihood: evolutionary modeling as a tool for understanding the genome. J Biomed Inform 39:51–61

    CAS  PubMed  Google Scholar 

  • Long JA. 2020. jtools: Analysis and Presentation of Social Scientific Data. R package version 2.1.0.

  • Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19:1–7

    CAS  PubMed  Google Scholar 

  • Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155

    CAS  PubMed  Google Scholar 

  • Nehrt NL, Clark WT, Radivojac P, Hahn MW (2011) Testing the ortholog conjecture with comparative functional genomic data from mammals. Rzhetsky A, editor. PLoS Comput. Biol. 7:e1002073

    CAS  PubMed  PubMed Central  Google Scholar 

  • Philippe H, Casane D, Gribaldo S, Lopez P, Meunier J (2003) Heterotachy and functional shift in protein evolution. IUBMB Life 55:257–265

    CAS  PubMed  Google Scholar 

  • Pollock DD, Pollard ST, Shortt JA, Goldstein RA. 2017. Mechanistic models of protein evolution. In: Pontarotti P (eds) Evolutionary biology: self/nonself evolution, species and complex traits evolution, methods and concepts. Springer, Cham. https://doi.org/10.1007/978-3-319-61569-1_15

    Chapter  Google Scholar 

  • Pollock DD, Thiltgen G, Goldstein RA (2012) Amino acid coevolution induces an evolutionary Stokes shift. Proc Natl Acad Sci 109:E1352–E1359

    CAS  PubMed  PubMed Central  Google Scholar 

  • R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  • Rastogi S, Liberles DA (2005) Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5:28

    PubMed  PubMed Central  Google Scholar 

  • Rogozin IB, Managadze D, Shabalina SA, Koonin EV (2014) Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture. Genome Biol Evol 6:754–762

    PubMed  PubMed Central  Google Scholar 

  • Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542

    PubMed  PubMed Central  Google Scholar 

  • RStudio Team (2015) RStudio: integrated development for R. RStudio Inc, Boston, MA

    Google Scholar 

  • Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5–15

  • Siltberg J, Liberles DA (2002) A simple covarion-based approach to analyse nucleotide substitution rates. J Evol Biol 15:588–594

    CAS  Google Scholar 

  • Spielman SJ, Wilke CO (2015) Pyvolve: a flexible python module for simulating sequences along phylogenies. Robinson-Rechavi M, editor. PLoS One 10:e0139047

    PubMed  PubMed Central  Google Scholar 

  • Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet 25:210–216

    CAS  PubMed  Google Scholar 

  • Studer RA, Robinson-Rechavi M (2010) Large-scale analysis of orthologs and paralogs under covarion-like and constant-but-different models of amino acid evolution. Mol Biol Evol 27:2618–2627

    CAS  PubMed  PubMed Central  Google Scholar 

  • Sukumaran J, Holder MT (2010) DendroPy: a Python library for phylogenetic computing. Bioinformatics 26:1569–1571

    CAS  PubMed  Google Scholar 

  • Teufel AI, Masel J, Liberles DA (2015) What fraction of duplicates observed in recently sequenced genomes is segregating and destined to fail to fix? Genome Biol Evol 7(8):2258–2264

    CAS  PubMed  PubMed Central  Google Scholar 

  • Teufel AI, Liu L, Liberles DA (2016) Models for gene duplication when dosage balance works as a transition state to subsequent neo- or sub-functionalization. BMC Evol Biol 16:45

    PubMed  PubMed Central  Google Scholar 

  • Tuffley C, Steel M (1998) Modeling the covarion hypothesis of nucleotide substitution. Math Biosci 147:63–91

    CAS  PubMed  Google Scholar 

  • Wagner A (1998) The fate of duplicated genes: loss or new function? BioEssays 20:785–788

    CAS  PubMed  Google Scholar 

  • Wickham H (2009) GGplot2: elegant graphics for data analysis. Springer, New York

    Google Scholar 

  • Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11:367–372

    CAS  PubMed  Google Scholar 

  • Yang Z, Kumar S (1996) Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. Mol Biol Evol 13:650–659

    CAS  PubMed  Google Scholar 

  • Zhou Y, Brinkmann H, Rodrigue N, Lartillot N, Philippe H (2010) A dirichlet process covarion mixture model and its assessments using posterior predictive discrepancy tests. Mol Biol Evol 27:371–384

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Claus O. Wilke and Dr. Stephanie J. Spielman for their support and assistance during the protein sequence simulation component of this study. We would also like to acknowledge the Instructional & Research Computing Center (IRCC) at Florida International University for providing HPC computing resources that have contributed to the research results reported within this paper, web: https://ircc.fiu.edu. This work was supported by a Doctoral Evidence Acquisition Fellowship (Summer 2018) and a Dissertation Year Fellowship (Fall 2018 through Spring 2019) from Florida International University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Joseph B. Ahrens or Jessica Siltberg-Liberles.

Additional information

Handling Editor: Jason de Koning.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 4521 kb)

Supplementary file2 (PDF 77 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahrens, J.B., Teufel, A.I. & Siltberg-Liberles, J. A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs. J Mol Evol 88, 720–730 (2020). https://doi.org/10.1007/s00239-020-09969-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-020-09969-7

Keywords

Navigation