Journal of Molecular Biology
Volume 432, Issue 19, 4 September 2020, Pages 5447-5459
Journal home page for Journal of Molecular Biology

Predicting Secondary Structure Propensities in IDPs Using Simple Statistics from Three-Residue Fragments

https://doi.org/10.1016/j.jmb.2020.07.026Get rights and content

Highlights

  • Partially structured motifs in IDPs are important for function.

  • LS2P is a new method to predict secondary structures in IDPs from their sequences.

  • Simple statistical approach using a structural database of three-residue fragments

  • LS2P connects sequence with structural features and experimental observations.

  • LS2P is publicly available through a web server.

Abstract

Intrinsically disordered proteins (IDPs) play key functional roles facilitated by their inherent plasticity. In most of the cases, IDPs recognize their partners through partially structured elements inserted in fully disordered chains. The identification and characterization of these elements is fundamental to understand the functional mechanisms of IDPs. Although several computational methods have been developed to identify order within disordered chains, most of the current secondary structure predictors are focused on globular proteins and are not necessarily appropriate for IDPs. Here, we present a comprehensible method, called Local Structural Propensity Predictor (LS2P), to predict secondary structure elements from IDP sequences. LS2P performs statistical analyses from a database of three-residue fragments extracted from coil regions of high-resolution protein structures. In addition to identifying scarcely populated helical and extended regions, the method pinpoints short stretches triggering β-turn formation or promoting α-helices. The simplicity of the method enables a direct connection between experimental observations and structural features encoded in IDP sequences.

Introduction

Intrinsically disordered proteins (IDPs) have emerged as key actors in multitude of relevant biological processes such as signaling, regulation and homeostasis [1., 2., 3.]. Moreover, malfunction of IDPs has been linked to a large proportion of cancers and neurodegenerative and cardiovascular diseases [4]. IDPs perform highly specialized functions despite they are devoid of permanent secondary or tertiary structure. Indeed, their malleability enables biological tasks that are out of reach for their globular counterparts [5]. In most cases, function is manifested when these flexible proteins interact with globular partners to trigger signaling or metabolic cascades [6]. These interactions are normally of low or moderate affinity, giving rise to fuzzy complexes where the IDP remains flexible upon binding [7,8]. These interactions are often mediated by short linear motifs or molecular recognition elements that specifically recognize the surface of the partner [9., 10., 11., 12.]. The presence of partially structured elements in short linear motifs tunes the thermodynamics and kinetics of the interaction, often assisted by their flanking regions [13]. Structural and electrostatic changes induced by post-translational modifications can also modulate the affinity of the interaction and represent efficient mechanisms of regulation [14,15].

The identification and characterization of partially structured elements in IDPs is complex and requires extensive experimental work, mainly using NMR. In particular, NMR chemical shifts and residual dipolar couplings (RDCs) are sensitive to small populations of secondary structural elements [16., 17., 18.]. Computational tools represent a good complement or an alternative to experimental studies to localize such structurally biased elements. For over 40 years, numerous methods have been developed to predict secondary structure in proteins from their amino acid sequence [19]. However, current secondary structure predictors are in general trained and evaluated on folded/globular proteins, and thus are not necessarily appropriate to identify partially structured regions in IDPs. Numerous methods have also been proposed to predict structural disorder from protein sequence ([20,21] and references therein). Most of the available disorder predictors focus on the identification of disordered regions in predominantly folded proteins. In general, they only provide a binary output (i.e. ordered/disordered) or a residue-specific disorder probability, but do not identify structural classes. Since they aim at providing different information, traditionally, secondary structure and disorder predictors have been developed independently from each other. One exception is the s2D method [22], which predicts secondary structure populations and disorder in a unified framework. s2D, as the work presented here, relies on a more holistic view of IDPs by exploring structural descriptors that span the continuum between ordered and disordered proteins [23,24,12].

In contrast to the most recent approaches, which are based on intricate machine-learning techniques, here we present an extremely simple strategy to identify secondary structural propensities from protein sequences. As machine-learning-based approaches, our method exploits structural information contained in databases. However, instead of training a machine-learning model or architecture, our approach performs simple statistical operations. These operations are based on a classification of the conformational preferences of three-residue fragments extracted from coil regions of experimentally determined high-resolution protein structures. Although small, tripeptides have been shown to encode relevant sequence-dependent structural information [25] and are valuable building blocks to model unfolded states and disordered proteins or regions [26., 27., 28.]. Furthermore, statistical analyses of three-residue fragments have also been used as key components of knowledge-based potentials and protein fold recognition methods [29,30].

We have evaluated the performance of our method, called local structural propensity predictor (LS2P), using a benchmark of nine well-characterized IDPs. LS2P accurately predicts previously identified helical and extended regions in the benchmark. Moreover, small stretches forming β-turns or promoting α-helices emerge from the analysis of the preferred structural classes of the tripeptides within the local sequence context. The main advantage of our strategy with respect to most machine-learning-based methods for secondary structure prediction, especially those using neural networks, is that it enables a comprehensible connection between amino acid sequence and structural preferences. LS2P is publicly available through a web server at: https://moma.laas.fr/applications/LS2P

Section snippets

Theory

The prediction method proposed in this work, LS2P, exploits statistical information about the structural preferences of three-residue fragments, called tripeptides from now on. This information was extracted from a structural database constructed from coil regions in high-resolution protein structures. Details about the tripeptide database construction can be found in the Materials and Methods section.

To simplify the structural classification, the conformational space of each residue ri was

Identification of secondary structure propensities in IDPs: An overall picture

A benchmark set of nine structurally well-characterized IDPs were used to evaluate the performance of our approach. Concretely, MAPK kinase 7 (MKK7) [32], the fragment 945–1097 of the erythrocyte binding antigen 181 (EBA-181) [33], p15 [34], Sic1 [14], measles virus ntail (ntailMV) [35], Sendai virus ntail (ntailSV) [36], the unique domain of the src kinase (USrc) [37], K18 construct of Tau protein (K18) [38], and full-length Tau protein [39] were used in our study. Predictions of secondary

Discussion

In this work, we have investigated the ability to predict secondary structure propensities within IDPs using local sequence-dependent information encoded in small protein fragments extracted from coil regions in high-resolution protein structures. We have developed an extremely simple statistical approach based on a coarse classification of tripeptide conformations. In contrast with nowadays popular neural-network-based secondary structure predictors, this approach enables a comprehensible

Tripeptide database

The tripeptide database was built from a curated database of high-resolution experimentally determined protein structures. More precisely, we used protein domains from the SCOPe [55] 2.06 release. In order to remove highly redundant sequences, we used the 95% sequence-identity-filtered subset of these domains. This subset consists of PDB-style files for 28,011 domains. DSSP [45] was employed to assign secondary structure labels to each residue in these files.

Each structure file was processed by

Availability

LS2P is publicly available through a web server at: https://moma.laas.fr/applications/LS2P.

The code of LS2P (in Python) and the data (number of structures for each tripeptide type and structural class extracted from high-resolution experimentally determined protein structures) are available upon request to the Lead Contact.

CRediT authorship contribution statement

Alejandro Estaña: Methodology, Data curation, Software, Writing - original draft. Amélie Barozet: Methodology, Writing - review & editing. Assia Mouhand: Investigation, Data curation. Marc Vaisset: Data curation, Software.Christophe Zanon:Software. Pierre Fauret: Software. Nathalie Sibille: Investigation, Writing - review & editing. Pau Bernadó: Conceptualization, Investigation, Supervision, Writing - original draft, Writing - review & editing. Juan Cortés: Conceptualization, Methodology,

Acknowledgments

This work was supported by the European Research Council under the H2020 Programme (2014–2020) chemREPEAT (648030) and Labex EpiGenMed (ANR-10-LABX-12-01) awarded to P.B., and the ANR GPCteR (ANR-17CE11-0022-01) to N.S. The CBS is a member of France-BioImaging (FBI) and the French Infrastructure for Integrated Structural Biology (FRISBI), two national infrastructures supported by the French National Research Agency (ANR-10INBS-04-01 and ANR-10-INBS-05, respectively).

Declaration of Competing Interest

The authors declare no conflict of interest.

References (55)

  • A. Estaña et al.

    Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database

    Structure

    (2019)
  • M. Blanc et al.

    Intrinsic disorder within the erythrocyte binding-like proteins from Plasmodium falciparum

    Biochim. Biophys. Acta

    (2014)
  • A. De Biasio et al.

    p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins

    Biophys. J.

    (2014)
  • Y. Pérez et al.

    Structural characterization of the natively unfolded N-terminal domain of human c-Src kinase: Insights into the role of phosphorylation of the unique domain

    J. Mol. Biol.

    (2009)
  • M. Schwalbe et al.

    Predictive atomic resolution descriptions of intrinsically disordered hTau40 and α-synuclein in solution from NMR and small angle scattering

    Structure

    (2014)
  • R. Linding et al.

    Protein disorder prediction: implications for structural proteomics

    Structure

    (2003)
  • J. Hanson et al.

    SPOT-Disorder2: improved protein intrinsic disorder prediction by ensembled deep learning

    Genom. Proteom. Bioinf.

    (2019)
  • T. Mittag et al.

    Structure/function implications in a dynamic complex of the intrinsically disordered sic1 with the cdc4 subunit of an {SCF} ubiquitin ligase

    Structure

    (2010)
  • J. Zimmerman et al.

    Characterization of amino acid sequences in proteins by statistical methods

    J. Theor. Biol

    (1968)
  • J.S. Richardson

    The anatomy and taxonomy of protein structure

  • V. Csizmok et al.

    Dynamic protein interaction networks and new structural paradigms in signaling

    Chem. Rev.

    (2016)
  • P.E. Wright et al.

    Intrinsically disordered proteins in cellular signalling and regulation

    Nat. Rev. Mol. Cell Biol.

    (2015)
  • V. N. Uversky, C. J. Oldfield, A. K. Dunker (2008). Intrinsically disordered proteins in human diseases: introducing...
  • H. Xie et al.

    Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions

    J. Proteome Res.

    (2007)
  • M. Fuxreiter

    Fuzziness: linking regulation to protein dynamics

    Mol. BioSyst.

    (2012)
  • K. Van Roey et al.

    Short linear motifs: Ubiquitous and functionally diverse protein interaction modules directing cell regulation

    Chem. Rev.

    (2014)
  • R. Pancsa et al.

    Interactions via intrinsically disordered regions: what kind of motifs?

    IUBMB Life

    (2012)
  • Cited by (9)

    • Structure–function relationships in protein homorepeats

      2023, Current Opinion in Structural Biology
    • On the Potential of Machine Learning to Examine the Relationship Between Sequence, Structure, Dynamics and Function of Intrinsically Disordered Proteins

      2021, Journal of Molecular Biology
      Citation Excerpt :

      Such approaches can be generalized and turned into probabilistic models using for example hidden Markov models42 or dynamic Bayesian networks.43 Recent developments have used a database of tri-peptide fragments to predict local structural properties in IDPs,44 or used molecular simulations of peptide fragments to create models of full-length IDPs.45 Combining such approaches may be a fruitful path towards constructing structural models of IDPs that contain transiently formed local structures.

    • An Integrative Structural Biology Analysis of Von Willebrand Factor Binding and Processing by ADAMTS-13 in Solution

      2021, Journal of Molecular Biology
      Citation Excerpt :

      The disorder propensity of vWF-strep-peptide was assessed in silico using POODLE-S and POODLE-L,47 PrDOS,48 RONN,49 Spritz-L and Spritz-S,50 IUPred-L and IUPred-S,51 DISpro, and iPDA52 through the Genesilico MetaDisorder web server.53 In addition, the secondary structure propensity of each residue was calculated by the LS2P method.54 Samples were prepared in 10 mM HEPES pH 7.4, 150 mM sodium chloride, and scattering data were collected at beamline P12 of the Petra III storage ring of the Deutsches Elektronensynchrotron (DESY) in Hamburg (Germany) at 20 °C.

    • Interdomain linkers tailor the stability of immunoglobulin repeats in polyproteins

      2021, Biochemical and Biophysical Research Communications
      Citation Excerpt :

      How does linker structure induce domain-linker contacts? To decipher this, we calculated the most preferred SS of the linkers using a recently developed computational tool called LS2P [17]. The method predicts the secondary structure propensities of a stretch of amino acids by breaking it into a sequence of overlapping tri-peptides.

    View all citing articles on Scopus

    Lead contact.

    View full text