Comprehensive predictions of secondary structures for comparative analysis in different species
Graphical abstract
Introduction
Protein structures are directly linked to their biological functions. The primary structure of proteins—the amino-acid sequence—has become abundantly available with the progress of decoding the genome in numerous species. On the contrary, the tertiary structure has not been elucidated due to the cost and difficulty in determining crystal structures. Therefore, there is a gap between the genome and the structure to understand their relation. To bridge the gap, we focused on the secondary structure (SS), which is a fundamental part of protein folding and precisely predictable from an amino-acid sequence. In this study, we predicted SSs of proteins from the whole genomes of various species.
Generally, SSs of proteins are periodically formed via the hydrogen bonds between an amide group and a carbonyl group in the backbone. Historically, the existences of α-helix and β-sheet were prospected by Pauling, Corey, and Branson and were confirmed as the SSs of proteins (Pauling and Corey, 1951, Pauling et al., 1951). By considering a combination of α-helix and β-sheet, three-dimensional structures of proteins have been classified as superfolds such as TIM barrel, α/β sandwich, Doubly Wound, and so on (Andreeva et al., 2014, Orengo et al., 1994).
Many studies have attempted to predict SSs from amino-acid sequences. Historically, the Chou–Fasman parameters were defined as ratios of probabilities of SS formations (Chou and Fasman, 1974). These parameters do not represent contexts of amino-acid sequences but SS frequencies of each residue. As an alternative prediction of SS, the Garnier–Osguthorpe–Robson (GOR) method utilizes conditional probabilities of SS considering neighboring residues (Garnier et al., 1978). Recent predictors of SS often adopt the machine learning algorithms such as a neural network. These algorithms predict SSs using both amino-acid sequences and sequence homologies. Nowadays, due to the progress of these algorithms, the accuracy of SS prediction is over 80% within a theoretical limitation (Smolarczyk et al., 2020).
In this study, we predicted the SSs of proteins from the amino-acid sequences of various species (i.e., eukaryotes, archaea, and bacteria) and clarified differences of SSs among the organisms. For eukaryotes, we considered the fission yeast Schizosaccharomyces pombe (Sz. pombe) and the budding yeast Saccharomyces cerevisiae (Sc. cerevisiae) because they are well-studied as unicellular model organisms. The genomes of both yeasts have been decoded, and there are many annotations for these proteins (Cherry et al., 2012, Goffeau et al., 1996, Lock et al., 2019, Wood et al., 2002). For archaea, we considered the euryarchaeota Methanocaldococcus jannaschii (M. jannaschii) and the crenarchaeota Pyrobaculum aerophilum (P. aerophilum). Herein, M. jannaschii is a thermophilic methanogen, whose genome was decoded in 1996 as the first in archaea (Bult et al., 1996). P. aerophilum is a thermophilic archaeon in a different phylum than M. jannaschii, and its genome was decoded in 2002 (Fitz-Gibbon et al., 2002). For bacteria, we considered Escherichia coli (E. coli), which is generally recognized as both a model organism of bacteria and a tool for molecular biology, and whose genome was decoded in 1997 (Blattner et al., 1997).
The differences of genome sequences among various species have provided valuable information to understand the evolution and the principles of protein folding. In this sense, we predicted the SS of whole proteins in the different species and compared their properties among the phylogenic domains. Additionally, we gave a careful consideration with respect to a possible relationship between SSs and biological functions of proteins.
Section snippets
Dataset preparations
The protein sequences were obtained from the following public databases: (1) Sz. Pombe, PomBase (pombase.org) (Lock et al., 2019, Wood et al., 2002); (2) Sc. Cerevisiae, Saccharomyces genome database (www.yeastgenome.org) (Cherry et al., 2012, Goffeau et al., 1996); (3) E. coli, Profiling of E. coli Chromosome (shigen.nig.ac.jp/ecoli/pec/index.jsp, Genbank: U00096); (4) M. jannaschii, UniProt (Genbank: L77117) (Bult et al., 1996); and (5) P. aerophilum, UniProt (Genbank: CP000561). (Fitz-Gibbon
Overview of SS prediction
In eukaryotes, the SS proportions predicted by PSIPRED were 40.6% for helix, 10.3% for strand, and 49.1% for coil (Fig. 1(a, b)). Bacteria and archaea seemed to be rich in strand, whereas proteins with coils and helices were relatively abundant in eukaryotes. For all the species, we plotted the distributions of SSs as a function of each SS length (Fig. 1(c)). It is noted that quite short SS fragments might have been poorly predicted due to the low confidence scores of PSIPRED. As shown in Fig. 1
Conclusion
In the present study, we predicted the SSs of proteins in several species using PSIPRED v4.0. We showed that the SS proportions and the SS lengths differed among the three biological domains (i.e., eukaryotes, bacteria, and archaea). In particular, we found a structural convergence toward strand in all of the species. In contrast, the reason why eukaryotes have an abundance of helices and coils is still unclear. The SS profiling of early eukaryotes will provide phylogenic information to address
CRediT authorship contribution statement
Rikuri Morita: . : Conceptualization, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Visualization. Yasuteru Shigeta: Validation, Writing - review & editing. Ryuhei Harada: Validation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References (45)
- et al.
Modulation of intrinsically disordered protein function by post-translational modifications
J. Biol. Chem.
(2016) - et al.
A degenerate cohort of yeast membrane trafficking DUBs mediates cell polarity and survival
Mol. Cell. Proteomics
(2015) - et al.
Absolute proteome and phosphoproteome dynamics during the cell cycle of Schizosaccharomyces pombe (Fission Yeast)
Mol. Cell. Proteomics
(2014) - et al.
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins
J. Mol. Biol.
(1978) Protein secondary structure prediction based on position-specific scoring matrices
J. Mol. Biol.
(1999)- et al.
Quantitative phosphoproteomics reveals pathways for coordination of cell growth and division by the conserved fission yeast kinase pom1
Mol. Cell. Proteomics
(2015) - et al.
Six classes of nuclear localization signals specific to different binding grooves of importin alpha
J. Biol. Chem.
(2009) - et al.
A simple method for displaying the hydropathic character of a protein
J. Mol. Biol.
(1982) - et al.
Direct and absolute quantification of over 1800 yeast proteins via selected reaction monitoring
Mol. Cell. Proteomics
(2016) A long twentieth century of the cell cycle and beyond
Cell
(2000)