Predicting eukaryotic protein secretion without signals

https://doi.org/10.1016/j.bbapap.2018.11.011Get rights and content

Highlights

  • Predicting secretion without signal peptides in eukaryotes is a hard problem.

  • The SecretomeP method from 2004 is well known and extensively used.

  • A new benchmark shows that SecretomeP performance is much lower than anticipated.

  • Other available methods are only slightly better.

Abstract

Predicting unconventional protein secretion is a much harder problem than predicting signal peptide-based protein secretion, both due to the small number of examples and due to the heterogeneity and the limited knowledge of the pathways involved, especially in eukaryotes. However, the idea that secreted proteins share certain properties regardless of the secretion pathway used made it possible to construct the prediction method SecretomeP in 2004. Here, we take a critical look at SecretomeP and its successors, and we also discuss whether multi-category subcellular location predictors can be used to predict unconventional protein secretion in eukaryotes. A new benchmark shows SecretomeP to perform much worse than initially estimated, casting doubt on the underlying hypothesis. On a more positive note, recent developments in machine learning may have the potential to construct new methods which can not only predict unconventional protein secretion but also point out which parts of a sequence are important for secretion.

Introduction

Prediction of classical signal peptide-based protein secretion has a long history in bioinformatics, with the earliest methods being published in the 1980's [[1], [2], [3]]. The secretory signal peptide is probably the best known and most well-described protein sorting signal, and the large interest in signal peptide prediction is reflected by the high number of citations to the papers describing the SignalP method [[4], [5], [6]], which has been available online since 1996 and is currently in version 4.1 [7].

SignalP is an example of a signal-based method for protein sorting prediction, where the computational model recognizes the actual sorting signal. The two other approaches are global property-based methods and homology-based methods [8]. Global property-based methods exploit the fact that proteins in different compartments have different physicochemical properties, which is reflected in e.g. different amino acid compositions, especially regarding the surfaces of the proteins [9]. The earliest method for distinguishing between intra- and extracellular proteins based on amino acid and amino acid pair compositions was published in 1994 [10]. Homology-based methods, on the other hand, exploit the fact that proteins tend to stay in the same compartment during the course of evolution, meaning that subcellular location can often be inferred by homology to proteins with known location [11].

However, not all secreted proteins follow the “classical” signal peptide-dependent pathway. An increasing number of eukaryotic proteins have been found to be released without passing the endomembrane system, including proteins with very important functions like cytokines [12]. Such proteins will go undetected by signal peptide-dependent prediction methods such as SignalP.

When attempting to predict which proteins are secreted by unconventional “non-classical” signal peptide-independent routes, especially in eukaryotes, one is faced with two obstacles. First, the signal-based approach is not available, since it is generally not known where in the sequence the signals for secretion occur. Second, the number of experimentally confirmed data from which to build a training set is extremely small.

In bacteria, the situation is different, since there are many more examples known of signal peptide-independent secretion (rarely termed “non-classical” in bacteria). In Gram-negative bacteria, the type I, III, IV, and VI secretion pathways function without signal peptides, and in some cases, there is evidence of N-terminal or C-terminal sorting signals [8,13]. In Gram-positive bacteria, there are also a few known pathways (Wss, holin, and SecA2) [13,14]. This paper will discuss prediction of non-classical secretion in eukaryotes only; prediction in bacteria has been described elsewhere [8,14].

Section snippets

The SecretomeP method

SecretomeP is a method from 2004 [15] for predicting non-classically secreted proteins from Mammalia. It was published by our former colleagues in the Center for Biological Sequence Analysis, which later was transformed into Department of Bio and Health Informatics. SecretomeP 2.0, published in 2005 [16], added the possibility for prediction in Gram-positive and Gram-negative bacteria; the mammalian part was not modified or retrained.

The authors chose a novel way to deal with the two obstacles

Other dedicated methods

Besides SecretomeP, we are aware of five other published methods specifically designed to predict secretion without signal peptides in eukaryotes. These predictive tools have been summarized in Table 1.

Interestingly, all these methods, like SecretomeP, focus on mammalian proteins alone; no method is available for non-mammal eukaryotes. However, none of the papers actually argue for that choice or cite any references showing that non-classical secretion in mammals differs from the process in,

Multi-location predictors

Besides methods that predict whether or not a protein is secreted, there are also several methods available which predict a larger number of subcellular locations, including “secreted” or “extracellular”. Such multi-location predictors could potentially also be used to predict secretion without signal peptides. However, since the majority of secreted proteins have signal peptides, some kind of signal peptide prediction will usually be built into such methods, either implicitly or explicitly. If

A critical re-evaluation of SecretomeP performance

In the years since SecretomeP was first developed a lot more data has become available for protein sequences that are secreted in a non-classical manner. In addition, one common question addressed to the curators of the SecretomeP web service is whether it performs equally well for all eukaryotic sequences as it does for mammalian sequences. As such, an opportunity has presented itself for a critical reevaluation of SecretomeP's performance.

We collected two data sets from UniProt, one with

Discussion

SecretomeP version 1 was, for its time, a bold and innovative suggestion for how to construct a predictor for secretion without signal peptides. It has been cited >800 times according to Google Scholar, and it is still being used extensively. However, its performance, measured on new independent data, is not nearly as good as we thought it would be, and the underlying hypothesis that extracellular proteins share features independent of the secretion pathway must be called into question.

SRTpred

Acknowledgements

The corresponding author is paid by the Technical University of Denmark. The authors (LZ and KS) thank the research commission of the University Hospital Düsseldorf for funding (FOKO 2018-27). The authors wish to thank Krishna Kumar Kandaswamy from the SPRED team for assistance with running the program.

References (46)

  • L. Yu et al.

    SecretP: a new method for predicting mammalian secreted proteins

    Peptides

    (2010)
  • L. Yu et al.

    SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition

    J. Theor. Biol.

    (2010)
  • K.K. Kandaswamy et al.

    SPRED: a machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes

    Biochem. Biophys. Res. Commun.

    (2010)
  • W.-L. Huang

    Ranking Gene Ontology terms for predicting non-classical secretory proteins in eukaryotes and prokaryotes

    J. Theor. Biol.

    (2012)
  • G. von Heijne

    Patterns of amino acids near signal-sequence cleavage sites

    Eur. J. Biochem.

    (1983)
  • G. von Heijne

    A new method for predicting signal sequence cleavage sites

    Nucleic Acids Res.

    (1986)
  • H. Nielsen et al.

    Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

    Protein Eng.

    (1997)
  • J.D. Bendtsen et al.

    Improved prediction of signal peptides: SignalP 3.0

    J. Mol. Biol.

    (2004)
  • T.N. Petersen et al.

    SignalP 4.0: discriminating signal peptides from transmembrane regions

    Nat. Meth.

    (2011)
  • H. Nielsen

    Predicting secretory proteins with SignalP

  • H. Nielsen

    Protein sorting prediction

  • R. Nair et al.

    Sequence conserved for subcellular localization

    Protein Sci.

    (2002)
  • H. Nielsen

    Predicting subcellular localization of proteins by bioinformatic algorithms

  • Cited by (21)

    • Molecular identification of peptidoglycan recognition protein 5 and its functional characterization in innate immunity of large yellow croaker, Larimichthys crocea

      2021, Developmental and Comparative Immunology
      Citation Excerpt :

      Mechanistically, studies have shown that several extracellular proteins, such as fibroblast growth factors (FGFs) found in the extracellular matrix, can be exported without a classical N-terminal signal peptide (Yu et al., 2010). Secretion of proteins without signal peptide is currently known as leaderless secretion or the non-conventional/non-classical secretory pathway (Nielsen et al., 2019). According to subcellular localizing assay conducted in HEK293T cells, LcPGRP5 protein is located in the nuclei and cytoplasm, which is consistent with the results of published literatures (Chang et al., 2009; Li et al., 2013).

    View all citing articles on Scopus
    View full text