A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication

https://doi.org/10.1016/j.jsb.2020.107608Get rights and content

Highlights

  • Tandem repeats might have evolved by exon duplication and rearrangement.

  • We present a tool to compare structural repeats and exon patterns.

  • We detect evolutionarily-related sequence phase and periodicity in repeat proteins.

  • We tested and confirmed the exon pattern conservation of these proteins homologs.

  • We propose that when the exon periodicity is detected it can be used to improve repeat annotation.

Abstract

Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the “repeat/exon plot”. An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific “evolutionary pattern” which may improve TRPs detection and classification.

Introduction

Tandem repeat proteins (TRPs) are a puzzling class of proteins whose 3D architecture consists of the repetition of a simple structural module, called “unit” (Kajava, 2012). In most instances structural units map one-to-one to repetitive sequence patterns, although the latter are frequently less conserved (Paladin and Tosatto, 2015). Structural units are stabilized by an axis of hydrophobic intra-unit interactions rather than a core (Espada et al., 2015). This arrangement confers unique properties to TRPs, including a linear folding pathway where each structural unit drives the folding of the following (Kobe and Kajava, 2000). In some cases a binding partner is involved in the stabilization (Perez-Riba et al., 2020). Their unique arrangement and structural plasticity allow insertions/deletions of structural units, which can be of remarkable structural diversity provided that they are compatible with the interactions within the stabilizing axis. An additional consequence is that TRPs show a higher surface/volume ratio in comparison to globular proteins. All these features make them a versatile framework for the formation of protein–protein interactions (Kobe and Kajava, 2001, Mosavi et al., 2004, Smith, 2008). TRPs are central in cell signaling and regulation, and widely distributed across functional pathways, performing binding functions that require high evolutionary adaptability such as immunity-related functions (Andrade et al., 2001, Delucchi et al., 2020). TRPs are abundant across the tree of life, but have specific roles in Eukaryotes and contributed to their evolution (Andrade et al., 2001, Schüler and Bornberg-Bauer, 2016, Schaper and Anisimova, 2015). Their prevalent role as binders and scaffolds was of high importance in the development of eukaryotic signaling and management of complexity and indeed they are far more abundant in multicellular organisms (Andrade et al., 2001). TRP folds emerged several times across different lineages, possibly arising from the multiple duplication of a segment in coding sequences.

For Eukaryotes, it has been suggested that repeated segments could correspond to exons, thus being easily duplicated and/or shuffled thanks to the modular intron/exon structure (Schaper and Anisimova, 2015). Domains encoded by single exons (exon-bordering domains) in proteins were demonstrated to be not only more abundant and widespread than those that are not (Liu et al., 2005), but also to show accelerated evolution (Lorente-Galdos, 2013), demonstrating the effectiveness of this framework. However, this refers mainly to autonomously folding domains, while the case of TRPs is more challenging. Each TR region is structurally symmetric, being composed of several small modules of few secondary structure elements (units), similar in structure and in the pattern of stabilizing interactions. Cases where the pattern of this structural symmetry correspond to a pattern in the exon arrangement make for a strong argument in favor of the hypothesis of evolution through duplication of exonic segments. In the cases where it does not, instead, the genetic rearrangement events that guided the protein evolution happened at a different level, from the protein domains to entire genes (Dohmen et al., 2020). When comparing exon structure to repeat units, it is important to consider that structural units are not autonomously folding as in most cases they need their neighbours. As a result, an exon coding for a repeat unit is not as evolutionarily versatile as one coding for a full domain, since it requires to be shuffled or duplicated in tandem with other repeats. On the other hand, duplication of an exon that includes multiple structural units is consistent with that requirement, often resulting in complex exon patterns that may include single-unit exons, multiple-unit exons or both. The latter may be justified for example by duplications and subsequent fusion in regions within the repeat that benefit from the multiple-unit exon stability. Thus, TRPs are compatible with evolution through intron-facilitated exon duplication, which could also confer them the advantage of easy insertion and removal of structural units to adapt to different binders. This is particularly relevant for TRP families that have a highly variable number of structural units, either across different organisms or by differential exon expression in cells or conditions. Evidence of exon duplication in proteins has been previously assessed by comparison of exon length and exon/intron phases (Fedorov et al., 1998) as well as by alignment between encoded sequences (Letunic et al., 2002). These features have also been investigated in TRPs, showing correspondence between structural unit and exon patterns in some TRP single-case studies (Haigis et al., 2002, Björklund et al., 2010, Light et al., 2012), but the literature in the field has so far struggled to draw unique conclusions for all repeat folds (Schaper et al., 2014, Street et al., 2006). One of the most recent studies on the topic establishes that boundaries of structured domains tend to fall in correspondence to exon boundaries, while disordered regions do not (Smithers, 2020). Repeat domains are somehow in-between these two categories, due to their linear and large-surface structure and peculiar folding pathway.

In order to visualize the repeat/exon patterns in TRPs and support their quantification, we exploited the RepeatsDB (Di Domenico et al., 2014, Paladin et al., 2017,) database of tandem repeated protein structures. TRP structures are classified in the database according to the structural unit length and type of contacts between units. We focused on structures with variable number of units and mainly stabilized by intra-unit interactions, falling into classes III (elongated repeats including solenoids) and IV (closed repeats or toroids). The exon structure of these repeat regions was compared to the structural repeat modularity, to identify patterns of association. We mapped information about the position of structural units from RepeatsDB (Paladin et al., 2017), together with structure (PDB (RCSB Protein Data Bank, 2020) and sequence data (UniProt (UniProt Consortium, 2019) and Ensembl (Cunningham, 2019) and designed a matrix, the repeat-exon plot, that merges useful information to support this comparison: (i) the length and position of exon boundaries and structural units along the protein, (ii) the structural similarity between units, and (iii) the structural similarity between exon-bordering fragments. While the position of exons with respect to structural units (i) provides information about the presence or absence of an exon-defined phase, the structural similarity (ii/iii) between fragments highlights the pattern within the structural unit or exon definitions. Information about the exon/unit relationships in TRPs would be of use to support hypotheses about their evolutionary mechanisms in relation to structural properties and folding pathways, as well as to support their detection and annotation. For example, when the exon-bordering fragments in a repeat region have almost identical sequence and structure, this strongly supports the hypothesis of evolution of the region through exon duplication. In particular, when TRP families show a very consistent repeat/exon pattern this information provides an evolutionarily related periodicity that should be taken into account in the annotation of repeats in sequences and structures. This is relevant in the context of an ongoing collaboration with Pfam database of protein families (El-Gebali, 2019) to shed light on the non-annotated fraction of proteomes, enriched in disorder and repeats (Mistry, 2013). The complete dataset and code is available at gitlab.com/refract-rise/repeat-exon, featuring information from all sources of data.

Section snippets

Repeat-exon plot

The repeat-exon plot is a matrix to visualize the pairwise comparison of structural fragments, as defined by exon boundaries and structural unit boundaries. The matrix shows in the upper half (from top to bottom and left to right) the exon array and on the lower half (from left to right and top to bottom) the structural unit array along the UniProt sequence length (the diagonal). The matrix is thus a combination of two matrices aligned along the diagonal.

Rows and columns represent exons and

Results

To study the correspondence between the periodicity of tandem repeat proteins (TRPs) and their coding exons, we contrasted information of protein structures and exon boundaries mapped to protein sequences of TRPs. Exons and protein domain boundaries have previously been extensively compared to derive information about protein evolution (Liu et al., 2005, Liu and Grigoriev, Sep. 2004). In this study, we focused on TRP structures in relationship to their exon arrangement, investigating the

Discussion

The present study provides a visualization tool to assess the correspondence between exons and structural symmetries in TRPs. The match between exon and structural repeat unit patterns observed in some single-case studies supports the hypothesis of repeat evolution through exon duplication and rearrangement (Haigis et al., 2002, Björklund et al., 2010, Light et al., 2012, Liu, Mar. 2003). In order to validate to what extent this phenomenon occurred at large, we designed a repeat/exon plot which

Conclusion

Tandem repeats in eukaryotes perform unique functions requiring quick adaptability. It has been suggested that some repeat families evolved by exon duplication and rearrangement. Here we designed a repeat/exon plot to visualize the relationship between exon and structural symmetries. We discussed Leucine Rich, Ankyrin, Pumilio and β propeller repeats, where very specific repeat/exon patterns are well conserved inside the same protein family. We facilitated the contextual use of exon information

CRediT authorship contribution statement

Lisanna Paladin: Conceptualization, Software, Writing - original draft. Marco Necci: Methodology, Visualization, Writing - review & editing. Damiano Piovesan: Supervision, Conceptualization, Writing - review & editing. Pablo Mier: Writing - review & editing. Miguel A. Andrade-Navarro: Writing - review & editing, Conceptualization. Silvio C.E. Tosatto: Conceptualization, Funding acquisition, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 823886.

References (50)

  • Y.K. Gupta et al.

    Structures of human pumilio with noncognate RNAs reveal molecular mechanisms for binding promiscuity

    Structure

    (2008)
  • M.A. Andrade et al.

    Comparison of ARM and HEAT protein repeats

    J. Mol. Biol.

    (2001)
  • A. Perez-Riba et al.

    The tetratricopeptide-repeat motif is a versatile platform that enables diverse modes of molecular recognition

    Curr. Opin. Struct. Biol.

    (2019)
  • R. Tewari et al.

    Armadillo-repeat protein functions: questions for little creatures

    Trends Cell Biol.

    (2010)
  • P.L. Clark

    How to build a complex, functional propeller protein, from parts

    Trends Biochem. Sci.

    (2016)
  • L. Paladin et al.

    Comparison of protein repeat classifications based on structure and sequence families

    Biochem. Soc. Trans.

    (2015)
  • R. Espada et al.

    Repeat proteins challenge the concept of structural domains

    Biochem. Soc. Trans.

    (2015)
  • L.K. Mosavi et al.

    The ankyrin repeat as molecular architecture for protein recognition

    Protein Sci. Publ. Protein Soc.

    (2004)
  • T.F. Smith

    Diversity of WD-repeat proteins

    Subcell. Biochem.

    (2008)
  • M. Delucchi et al.

    A new census of protein tandem repeats and their relationship with intrinsic disorder

    Genes

    (2020)
  • A. Schüler et al.

    Evolution of Protein Domain Repeats in Metazoa

    Mol. Biol. Evol.

    (Dec. 2016)
  • E. Schaper et al.

    The evolution and function of protein tandem repeats in plants

    New Phytol.

    (2015)
  • M. Liu et al.

    Exon-domain correlation and its corollaries

    Bioinforma. Oxf. Engl.

    (2005)
  • B. Lorente-Galdos

    Accelerated exon evolution within primate segmental duplications

    Genome Biol.

    (2013)
  • E. Dohmen et al.

    The modular nature of protein evolution: domain rearrangement rates across eukaryotic life

    BMC Evol. Biol.

    (2020)
  • Cited by (9)

    • Removing quote marks from the RNA polymerase II CTD ‘code’

      2021, BioSystems
      Citation Excerpt :

      Then, once in place, the domain “quickly proved to be an attractive binding platform for a wide variety of protein partners” (Yang and Stiller, 2014). As a general property, tandem repeat protein domains are characterized by high evolutionary plasticity and by a high surface/volume ratio, partly related to their intrinsic disorder, which makes them suitable as polypeptide surfaces for the binding of protein partners (Paladin et al., 2020; Delucchi et al., 2020), but also for the deposition of PTM marks in different combinations resulting in differential, regulated partner recruitment. An example, directly connected to the Pol II CTD system, is provided by the Spt5 subunit of the DSIF elongation factor, which can be phosphorylated by P-TEFb (section 2.3).

    • The Taming of the Screw: the natural and artificial development of β-propeller proteins

      2021, Current Opinion in Structural Biology
      Citation Excerpt :

      Current evolutionary theory strongly supports the hypothesis that proteins have evolved by duplication and fusion of small peptides, a process whose ‘fossilised’ remnants are most evident in repeat proteins [12••]. Propeller proteins have been the focus of several analytical investigations into this process, for example by comparing exon/intron boundaries with structural features of repeat proteins [13•]. A bioinformatics study by Lupas and colleagues in 2008 supported the notion that the major families of propellers were amplified independently from single blades, and that this is an ongoing process [14].

    • REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences

      2021, Journal of Molecular Biology
      Citation Excerpt :

      On the other hand, rapid gene expansion of TR containing families, or taxonomically specific expansion of regions with TRs (as we observed in Drosophila), might be used for faster adaptation to new environments or to provide an advantage on either side of the host-pathogen arms race. The evolutionary study of TRs can bring insight into the mechanisms of gene expansion24 and into the relation between protein structure and function1 and disease.25 With the results presented in this work we have shown how REP2 could ease such studies.

    View all citing articles on Scopus
    View full text