Comprehensive predictions of secondary structures for comparative analysis in different species

doi:10.1016/j.jsb.2021.107735

Journal of Structural Biology

Volume 213, Issue 2, June 2021, 107735

https://doi.org/10.1016/j.jsb.2021.107735 Get rights and content

Abstract

Protein structures are directly linked to biological functions. However, there is a gap of knowledge between the decoded genome and the structure. To bridge the gap, we focused on the secondary structure (SS). From a comprehensive analysis of predicted SS of proteins in different types of organisms, we have arrived at the following findings: The proportions of SS in genomes were different among phylogenic domains. The distributions of strand lengths indicated structural limitations in all of the species. Different from bacteria and archaea, eukaryotes have an abundance of α-helical and random coil proteins. Interestingly, there was a relationship between SS and post-translational modifications. By calculating hydrophobicity moments of helices and strands, highly amphipathic fragments of SS were found, which might be related to the biological functions. In conclusion, comprehensive predictions of SS will provide valuable perspectives to understand the entire protein structures in genomes and will help one to discover or design functional proteins.

Graphical abstract

Introduction

Protein structures are directly linked to their biological functions. The primary structure of proteins—the amino-acid sequence—has become abundantly available with the progress of decoding the genome in numerous species. On the contrary, the tertiary structure has not been elucidated due to the cost and difficulty in determining crystal structures. Therefore, there is a gap between the genome and the structure to understand their relation. To bridge the gap, we focused on the secondary structure (SS), which is a fundamental part of protein folding and precisely predictable from an amino-acid sequence. In this study, we predicted SSs of proteins from the whole genomes of various species.

Generally, SSs of proteins are periodically formed via the hydrogen bonds between an amide group and a carbonyl group in the backbone. Historically, the existences of α-helix and β-sheet were prospected by Pauling, Corey, and Branson and were confirmed as the SSs of proteins (Pauling and Corey, 1951, Pauling et al., 1951). By considering a combination of α-helix and β-sheet, three-dimensional structures of proteins have been classified as superfolds such as TIM barrel, α/β sandwich, Doubly Wound, and so on (Andreeva et al., 2014, Orengo et al., 1994).

Many studies have attempted to predict SSs from amino-acid sequences. Historically, the Chou–Fasman parameters were defined as ratios of probabilities of SS formations (Chou and Fasman, 1974). These parameters do not represent contexts of amino-acid sequences but SS frequencies of each residue. As an alternative prediction of SS, the Garnier–Osguthorpe–Robson (GOR) method utilizes conditional probabilities of SS considering neighboring residues (Garnier et al., 1978). Recent predictors of SS often adopt the machine learning algorithms such as a neural network. These algorithms predict SSs using both amino-acid sequences and sequence homologies. Nowadays, due to the progress of these algorithms, the accuracy of SS prediction is over 80% within a theoretical limitation (Smolarczyk et al., 2020).

In this study, we predicted the SSs of proteins from the amino-acid sequences of various species (i.e., eukaryotes, archaea, and bacteria) and clarified differences of SSs among the organisms. For eukaryotes, we considered the fission yeast Schizosaccharomyces pombe (Sz. pombe) and the budding yeast Saccharomyces cerevisiae (Sc. cerevisiae) because they are well-studied as unicellular model organisms. The genomes of both yeasts have been decoded, and there are many annotations for these proteins (Cherry et al., 2012, Goffeau et al., 1996, Lock et al., 2019, Wood et al., 2002). For archaea, we considered the euryarchaeota Methanocaldococcus jannaschii (M. jannaschii) and the crenarchaeota Pyrobaculum aerophilum (P. aerophilum). Herein, M. jannaschii is a thermophilic methanogen, whose genome was decoded in 1996 as the first in archaea (Bult et al., 1996). P. aerophilum is a thermophilic archaeon in a different phylum than M. jannaschii, and its genome was decoded in 2002 (Fitz-Gibbon et al., 2002). For bacteria, we considered Escherichia coli (E. coli), which is generally recognized as both a model organism of bacteria and a tool for molecular biology, and whose genome was decoded in 1997 (Blattner et al., 1997).

The differences of genome sequences among various species have provided valuable information to understand the evolution and the principles of protein folding. In this sense, we predicted the SS of whole proteins in the different species and compared their properties among the phylogenic domains. Additionally, we gave a careful consideration with respect to a possible relationship between SSs and biological functions of proteins.

Section snippets

Dataset preparations

The protein sequences were obtained from the following public databases: (1) Sz. Pombe, PomBase (pombase.org) (Lock et al., 2019, Wood et al., 2002); (2) Sc. Cerevisiae, Saccharomyces genome database (www.yeastgenome.org) (Cherry et al., 2012, Goffeau et al., 1996); (3) E. coli, Profiling of E. coli Chromosome (shigen.nig.ac.jp/ecoli/pec/index.jsp, Genbank: U00096); (4) M. jannaschii, UniProt (Genbank: L77117) (Bult et al., 1996); and (5) P. aerophilum, UniProt (Genbank: CP000561). (Fitz-Gibbon

Overview of SS prediction

In eukaryotes, the SS proportions predicted by PSIPRED were 40.6% for helix, 10.3% for strand, and 49.1% for coil (Fig. 1(a, b)). Bacteria and archaea seemed to be rich in strand, whereas proteins with coils and helices were relatively abundant in eukaryotes. For all the species, we plotted the distributions of SSs as a function of each SS length (Fig. 1(c)). It is noted that quite short SS fragments might have been poorly predicted due to the low confidence scores of PSIPRED. As shown in Fig. 1

Conclusion

In the present study, we predicted the SSs of proteins in several species using PSIPRED v4.0. We showed that the SS proportions and the SS lengths differed among the three biological domains (i.e., eukaryotes, bacteria, and archaea). In particular, we found a structural convergence toward strand in all of the species. In contrast, the reason why eukaryotes have an abundance of helices and coils is still unclear. The SS profiling of early eukaryotes will provide phylogenic information to address

CRediT authorship contribution statement

Rikuri Morita: . : Conceptualization, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Visualization. Yasuteru Shigeta: Validation, Writing - review & editing. Ryuhei Harada: Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (45)

A. Bah et al.
Modulation of intrinsically disordered protein function by post-translational modifications
J. Biol. Chem.
(2016)
J.R. Beckley et al.
A degenerate cohort of yeast membrane trafficking DUBs mediates cell polarity and survival
Mol. Cell. Proteomics
(2015)
A. Carpy et al.
Absolute proteome and phosphoproteome dynamics during the cell cycle of Schizosaccharomyces pombe (Fission Yeast)
Mol. Cell. Proteomics
(2014)
J. Garnier et al.
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins
J. Mol. Biol.
(1978)
D.T. Jones
Protein secondary structure prediction based on position-specific scoring matrices
J. Mol. Biol.
(1999)
A.N. Kettenbach et al.
Quantitative phosphoproteomics reveals pathways for coordination of cell growth and division by the conserved fission yeast kinase pom1
Mol. Cell. Proteomics
(2015)
S. Kosugi et al.
Six classes of nuclear localization signals specific to different binding grooves of importin alpha
J. Biol. Chem.
(2009)
J. Kyte et al.
A simple method for displaying the hydropathic character of a protein
J. Mol. Biol.
(1982)
C. Lawless et al.
Direct and absolute quantification of over 1800 yeast proteins via selected reaction monitoring
Mol. Cell. Proteomics
(2016)
P. Nurse
A long twentieth century of the cell cycle and beyond
Cell
(2000)

C. Nick Pace et al.

A helix propensity scale based on experimental studies of peptides and proteins

Biophys. J.

(1998)

M.P. Swaffer et al.

Quantitative phosphoproteomics reveals the signaling dynamics of cell-cycle kinases in the fission yeast Schizosaccharomyces pombe

Cell Rep.

(2018)

G. Abrusán et al.

Alpha helices are more robust to mutations than beta strands

PLoS Comput. Biol.

(2016)

A. Andreeva et al.

SCOP2 prototype: a new approach to protein structure mining

Nucl. Acids Res.

(2014)

A. Bachmair et al.

In vivo half-life of a protein is a function of its amino-terminal residue

Science

(1986)

F.R. Blattner et al.

The complete genome sequence of Escherichia coli K-12

Science

(1997)

H.P. Bogerd et al.

Protein sequence requirements for function of the human T-cell leukemia virus type 1 Rex nuclear export signal delineated by a novel in vivo randomization-selection assay

Mol. Cell. Biol.

(1996)

C.J. Bult et al.

Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii

Science

(1996)

J.M. Cherry et al.

Saccharomyces Genome Database: the genomics resource of budding yeast

Nucl. Acids Res.

(2012)

P.Y. Chou et al.

Prediction of protein conformation

Biochemistry

(1974)

D. Eisenberg et al.

The helical hydrophobic moment: a measure of the amphiphilicity of a helix

Nature

(1982)

S.T. Fitz-Gibbon et al.

Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum

Proc. Natl. Acad. Sci. U.S.A.

(2002)

Cited by (1)

Benchmarking protein structure predictors to assist machine learning-guided peptide discovery
2023, Digital Discovery

View full text

Comprehensive predictions of secondary structures for comparative analysis in different species

Abstract

Graphical abstract

Introduction

Section snippets

Dataset preparations

Overview of SS prediction

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

J. Biol. Chem.

Mol. Cell. Proteomics

Mol. Cell. Proteomics

J. Mol. Biol.

J. Mol. Biol.

Mol. Cell. Proteomics

J. Biol. Chem.

J. Mol. Biol.

Mol. Cell. Proteomics

Cell

Biophys. J.

Cell Rep.

Alpha helices are more robust to mutations than beta strands

PLoS Comput. Biol.

SCOP2 prototype: a new approach to protein structure mining

Nucl. Acids Res.

In vivo half-life of a protein is a function of its amino-terminal residue

Science

The complete genome sequence of Escherichia coli K-12

Science

Protein sequence requirements for function of the human T-cell leukemia virus type 1 Rex nuclear export signal delineated by a novel in vivo randomization-selection assay

Mol. Cell. Biol.

Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii

Science

Saccharomyces Genome Database: the genomics resource of budding yeast

Nucl. Acids Res.

Prediction of protein conformation

Biochemistry

The helical hydrophobic moment: a measure of the amphiphilicity of a helix

Nature

Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum

Proc. Natl. Acad. Sci. U.S.A.