1 Introduction

Substrate binding proteins (SBPs) initially recognize their substrates in the periplasmic space and delivers the substrates to membrane-bound subunits that catalyze concentrative uptake into cells [1,2,3]. SBP is part of a family of proteins consisting of ATP-binding cassette (ABC)-transporters for substrate uptake, ion-gradient driven transporters, DNA binding proteins, as well as, prokaryotic and eukaryotic channels and receptors [2,3,4,5]. In 1999, SBPs were classified based on their sequence similarities and topological arrangements in their β-sheet region [6]. Recently, their classification has been updated into seven cluster groups based on a number of SBP structures deposited in Protein Data Bank (PDB), with each cluster having different structural characteristics [7]. These classified SBPs are involved in a unique molecular mechanism of the functioning of transporters, channels, and signal transducers [7]. In the protein database, SBPs vary in size from approximately 25–70 kDa [5, 8]. These SBPs have low sequence similarity but a highly conserved overall three-dimensional structural fold [5]. The core of SBPs consist of two structural α/β domains which are connected by a flexible hinge region [8]. This shows that the unique architecture depends on the classified SBP cluster [7]. Substrate binding occurs at the flexible hinge region located between the two domains of SBP, stabilizing the closed form of the tightly packed protein with its substrate buried at the interface [5]. Typically, SBP is present in four structural states during the process of substrate recognition: (i) open-unliganded, (ii) open-liganded, (iii) closed-unliganded, and (iv) closed-liganded [5].

Rhodothermus marinus is a thermohalophilic bacterium that grows optimally at 65 °C [9]. We previously characterized the short length Rhodothermus marinus SBP (named as RmSBP) consisting of 138 amino acids, excluding the signal peptide [10]. This SBP gene is located around methyl-accepting chemotaxis protein (MCP) gene, which is composed of a single peptide, transmembrane, HAMP, and methyl-accepting transducer regions [10]. A similar feature of SBP-MCP gene cluster from R. marinus is also found in Rhodothermus profundi, Rhodothermaceae bacterium RA, Salinibacter ruber strain DSM 13855, and Salinibacter ruber strain M8 [10], indicating that short length SBP often exists in the nature along with its counterpart MCP protein. We previously determined the crystal structure of RmSBP at pH 4.5 and identified the presence of a single α/β domain [10]. RmSBP had a high structural similarity with C-terminal domain of Streptococcus pneumonia SBP (PDB code: 3LFT, r.m.s. deviation of 2.3 Å for 149 Cα-atoms, named as SpSBP) and Vibrio cholerae serotype O1 SBP (3LKV, 2.5 Å for 149 Cα-atoms, named as VcSBP). SpSBP and VcSBP structures interact with l-tryptophan and l-phenylalanine amino acids, respectively. The residues that recognize these amino acids are not conserved in RmSBP [10]. Although the structural features of RmSBP have been analyzed, the mechanism by which they recognize substrates is still unknown.

To better understand the substrate recognition of short length RmSBP, we performed a study combining comparative structural analysis, computational substrate docking, and X-ray crystallographic data. We described the topology between RmSBP and other SBPs and modeled potential substrate binding sites. The crystal structures of RmSBP at pH 5.5, 6.0, and 7.5 were determined at 1.5, 1.8, and 1.9 Å resolution, respectively. The structural flexibility present in peripheral β1–α2 loop and β5–β6 loop of RmSBP, as well as, in extended C-terminal regions was observed. Our results provide the beginning framework to understand the molecular functions of short length SBP.

2 Materials and Methods

2.1 Comparative and Computational Analysis

The crystal structure of RmSBP at pH 4.5 (PDB code 5Z6V) was used as a starting point for the homolog search and substrate prediction study. The obtained homolog models were searched and evaluated using Phyre2 server [11]. TM (template model) scores were also obtained using Phyre2 server. The putative substrate binding site were predicted using 3DLigandSite server [12].

2.2 Protein Expression and Purification

Detailed protocols for cloning and expression of proteins have been reported in a previous study [10]. Briefly, RmSBP gene, excluding the signal peptide, was cloned into pET28 vector and expressed in E. coli BL21 (DE3) cells. Purified recombinant RmSBP was obtained by a two-step purification process using a Ni–NTA affinity column and size exclusion chromatography. The final purified protein was stored in 10 mM Tris–HCl, pH 8.0 and 200 mM NaCl.

2.3 Crystallization

Purified RmSBP were concentrated to 20 mg/mL using Centricon (Millipore, 10 kDa cutoff). Crystallization screens were performed using the sitting-drop vapor diffusion method at 20 °C using commercial crystallization kits. Briefly, 0.3 μL protein solution was mixed with 0.3 μL precipitant solution and equilibrated against 70 μL precipitant solution. Microcrystals were obtained by following 3 different conditions: (i) 0.1 M Bis-Tris, pH 5.5, 0.2 M MgCl2, and 25% (w/v) polyethylene glycol 3350, (ii) 0.1 M MES, pH 6.0 and 1.26 M ammonium sulfate, and (iii) 0.1 M HEPES, pH 7.5 and 25% (w/v) polyethylene glycol 3350. Suitable crystals for X-ray diffraction were obtained using the sitting-drop vapor diffusion method at 20 °C by mixing 1.5 μL protein solution and 1.5 μL precipitant solution and then equilibrated against 200 μL reservoir solution with crystallization solutions mentioned above.

2.4 Diffraction Data Collection

X-ray diffraction data for RmSBP crystals were collected at 100 K on beamline 7A at Pohang Light Source II (PLS-II, Korea) [13]. All crystals were equilibrated in a cryoprotectant solution containing reservoir supplemented with 20% (v/v) ethylene glycol and then flash-cooled in a stream of liquid nitrogen. For the amino acid soaking experiment, RmSBP crystals were soaked in a cryoprotectant solution supplemented with an amino acid mixture 1 min before data collection. The diffraction images were indexed, integrated, and scaled with the HKL2000 package [14]. The data collection statistics are listed in Table 1.

Table 1 Data collection and refinement statistics

2.5 Structure Determination

The initial phases of RmSBPs were solved using the molecular replacement method by Phaser-MR in Phenix [15] with selenium-derived RmSBP at pH 4.5 (PDB code: 5Z6V) [10] as a search model. Manual model building was performed with COOT program [16]. Model refinement was performed with Refmac5 [17] and Phenix refinement in Phenix [18]. The geometry of refined model was evaluated using MolProbity server [19]. The structure refinement statistics are listed in Table 1. Figures were generated with the PyMOL [20]. Structure factors and coordinates have been deposited in the Protein Data Bank under PDB codes 6K1W (pH 5.5), 6K1X (pH 6.0) and 6K1Y (pH 7.5).

3 Results

3.1 Computational Analysis of RmSBP

All SBP structures reported to date recognize their substrates by their flexible hinge region present between the two structural domains [7], whereas RmSBP has a single domain [10]. We previously suggested that these RmSBPs can recognize their substrates alone or with the help of other partner proteins [10]. In both the cases, we hypothesized that structural changes in the peripheral regions of RmSBP were required to recognize the substrate molecule. To better understand the substrate recognition of RmSBP, we performed comparative analysis and substrate docking studies using a previously reported crystal structure of SBP (PDB code 5Z6V) as the initial model structure. The analysis using Phyre2 server provided 19 models which similar to RmSBP and also provided the expected 19 substrate binding sites. Among them, 9 models (PDB codes: 3LFT, 2QH8, 5ER3, 4RS3, 4KZK, 5BRA, 3KSM, 2DRI, and 6DSP) with TM-scores of > 0.52 were used in this study, which were all either amino acid or sugar binding SBPs (Fig. 1a and Table 2). These models consisted of α/β fold structures and had low sequence identity of 12–23% with RmSBP. The superimposition of RmSBP with other SBPs showed similarities in the core α/β domain consisting of 73–119 Cα atoms (with r.m.s. deviation of 1.753–2.133 Å), whereas the topology of C-terminal residues showed differences in conformation (Fig. 1a). In RmSBP structure, β6-strand in extended C-terminal domain formed an antiparallel β-sheet with β5-strand of the α/β domain core. In addition, α5- and α6-helixes of RmSBP flanked their α/β domain. In contrast, the C-terminal regions of other SBPs were present upward in the direction of the substrate binding site (Fig. 1a and b), which further help in the binding with its partner α/β domain for target substrate recognition. RmSBP not only showed structural similarity to an SBP from a non-thermophilic bacteria as per the Phyre2 server, but also showed structural similarity with the C-terminal domain of an SBP from the thermophilic Aeropyrum pernix (Supplementary Fig. 1). Therefore, the extended C-terminal domain of RmSBP had significantly distinct topology against other typical SBPs. Next, the prediction of substrate binding site was performed using 3DLigandSite software, which displayed 15 binding sites on the surface of α/β domain using PLP (pyridoxal phosphate) as a model substrate. Results showed that N-terminus (Glu27, Val28, and Thr28), α1-helix (Gln32 and Gln33), β3–α4 loop (Leu106 and Glu107), and β5–β6 loop (Lys150) residues in RmSBP were predicted to be substrate binding sites (Fig. 1c). Among these residues, the conformational changes in side chains of Gln32 and Gln33 are probable, but large conformation changes in the whole protein were difficult since the main chain was present as a stable helix formation. Leu106 and Glu107 residues, on the other hand, were located in a sharp turn of β3–α4 loop, thus limiting large conformational changes. Based on the computational docking study, we considered that non-structural N-terminus and β5–β6 loop of RmSBP could show certain amount of structural flexibility.

Fig. 1
figure 1

Computational analysis of RmSBP. a Comparative analysis on topologies of RmSBP and other SBPs. α/β domain is indicated by a grey ribbon. α-helix, β-strands and loops in the C-terminal domain are represented by red, yellow, and green ribbons, respectively. b Superimposition of RmSBP with other structural homolog SBPs. The C-terminal domain of RmSBP and other SBPs are represented by blue and red ribbons, respectively. c Computational prediction of the substrate binding to RmSBP. Total 15 PLP molecules were used as model substrates and have been placed on α/β domain of RmSBP. A cartoon representation of the predicted substrate recognition loops is also shown here (colored blue). d Predicted substrate recognition residues are shown as blue sticks (Color figure online)

Table 2 Models used for structural comparison

3.2 Crystal Structures of RmSBPs

Our computational substrate docking study suggested that RmSBP might have structural flexibility in the peripheral loop region of α/β domain and C-terminal domain. However, in the previously reported structure of RmSBP at pH 4.5, the peripheral loops and α/β-fold of RmSBP exhibited a highly rigid structure (see below). As a result, there was no experimental evidence to prove the structural flexibility of RmSBP by computational analysis. As a proof for computational analysis results, we performed an extended crystallographic study to observe structural flexibility in the peripheral loops on the substrate recognition surface of RmSBP. We obtained RmSBP crystals at pH 5.5, 6.0, and 7.5 with different crystallization conditions. Crystals of RmSBP at pH 5.5 and 7.5 belonged to the orthorhombic space group P212121, with a similar unit-cell dimension of approximately a = 46 Å, b = 48 Å and c = 58 Å and occupying one molecule in the asymmetric unit (Table 1). Crystal of RmSBP at pH 6.0 belonged to the orthorhombic space group P21221, with unit cell dimension of 34 Å, 35 Å, and 115 Å and occupying one molecule in the asymmetric unit (Table 1). In contrast, the previously reported RmSBP crystal at pH 4.5 belonged to the monoclinic space group C21 [10]. The structures of RmSBP at pH 5.5, 6.0, and 7.5 were refined up to 1.5 Å, 1.8 Å, 1.9 Å resolutions, respectively, and produced Rwork/Rfree of 18.5%/21.8%, 19.0%/22.0%, and 17.8%/22.8%, respectively. All RmSBP structures at pH 5.5, 6.0, and 7.5 were composed of six α-helices and six β-strands, and formed an α/β fold with an extra domain at C-terminal region (Fig. 2a). Detailed structural topology information describing RmSBP structure has been previously reported [10]. Here, we described a novel finding for the peripheral flexible regions of RmSBP. Previously, we had classified RmSBP structure into a single α/β domain [10]. However, in the present study, RmSBP was newly classified into an α/β domain (α1–α4 and β1–β5) and a C-terminal extended domain (α5, α6, and β6) through comparative analysis with other homologous SBP structures (Figs. 1a and 2a).

Fig. 2
figure 2

Crystal structures of RmSBPs. a The overall structure of RmSBP (pH 7.5) consists of an α/β core domain (cyan) and an extended C-terminal domain (pink). b Superimposition of RmSBPs at pH 4.5 (red), 5.5 (yellow), 6.0 (green), and 7.5 (blue) is shown here. Conformational differences of RmSBP are observed at β1–α2 loop (yellow transparency) and C-terminal domain (orange transparency) (Color figure online)

For all RmSBP structures obtained at pH 5.5, 6.0, and 7.5, electron density maps of core of α/β domain of RmSBP was well defined, but the peripheral loop and C-terminus were disordered or structurally flexible. In RmSBP-pH 5.5, the electron density map for four amino acids (Asp151, Ala152, Glu153, and Gly154) in the β4–β5 loop and two residues (Asp177 and Arg178) in the α6-helix region on the C-terminal domain were disordered. In RmSBP-pH 6.0, the electron density map for three residues (Asp151, Ala152, Glu153) at β4–β5 loop were disordered. In RmSBP-pH 7.5, although all the residues were fitted into the electron density map without any disorder, however, structural flexibility with a high B-factor value was observed (see below). The superimposition of crystal structures of RmSBP at pH 5.5, pH 6.0, and pH 7.5 with a previously reported crystal structure of RmSBP at pH 4.5 showed similarity in all Cα atoms (with r.m.s. deviation of 0.7196–1.3262), but two significantly different conformations were observed. In the β1–α2 loop region, Cα atoms at pH 6.0 in the loop portion of the β1–α2 loop were shifted by about 2.0 Å compared to Cα atoms in structures obtained at other pHs, and by about 1.3 Å in the α2-helix region (Fig. 2b). On the other hand, the extended C-terminal domain was not structurally aligned (Fig. 2b) and was shifted by 2.8 Å in the α5-helix region, and by 2.3 Å and 2.7 Å in the loop and C-terminal region, respectively (Fig. 2b).

Subsequently, we performed temperature B-factor analysis on all RmSBP structures (Fig. 3). At pH 4.5, RmSBP showed a rigid fold with a low B-factor value, except in the N-terminus (Fig. 3a). At pH 5.5, B-factor of RmSBP was relatively high at β1–α2 loop region present on α/β domain, and C-terminal domain was not built due to lack of electron density map in that region (Fig. 3a). At pH 6.0, the overall α/β domain of RmSBP showed high rigidity, but it also displayed high B-factor value at the α6-helix region in C-terminal domain (Fig. 3a). At pH 7.5, there were no disordered regions in the electron density map of RmSBP, which was similar to RmSBP structure at pH 4.5, but a relatively high flexibility was observed in β1–α2 loop and C-terminal domain (Fig. 3a). The analysis of normalized Cα atom B-factor values for the four RmSBP domains showed that residues in β1–α2 loop, β5–β6 loop and C-term helix had a relatively higher flexibility when compared with other residues (Fig. 3b). On the other hand, in RmSBP structure, the portion of the electron density map with disordered or relatively high B-factor area did not have a structural change in the proportion of the acid or basic pH concentrations. Next, we analyzed the electrostatic surfaces of the structures of RmSBP at pH 5.5, 6.0, and 7.5 (Fig. 3c). The surface charges on the C-terminal region of RmSBP are observed differently because there is some amount of disorder in that region. On the other hand, the putative substrate binding site has a negative charge in common, and no specific differences were found between the three structures (Fig. 3c). As a result, the potential substrate binding sites are well preserved, while the peripheral loops of RmSBP are shown to be flexible.

Fig. 3
figure 3

Analysis of flexible region of RmSBPs. a B-factor representation of RmSBPs at pH 4.5, 5.5, 6.0, and 7.5. b A plot showing normalized B-factor values of Cα atoms of RmSBPs at pH 4.5 (red), 5.5 (orange), 6.0 (green), and 7.5 (blue). c Electrostatic surface of structures of RmSBP at pH 5.5, 6.0, and 7.5. Putative substrate binding sites commonly exhibited a negative charge (Color figure online)

4 Discussion

We performed a comparative, computational and structural analysis of RmSBP for comprehensive understanding of the molecular functions involved in this short length SBP. We previously described that RmSBP consisted of a single α/β domain, however, in this study, we divided the RmSBP structure into a α/β domain and an extended C-terminal domain based on the comparative structural analysis. In particular, the C-terminal domain represented a unique topology in which RmSBP was distinguished from other SBPs. In the substrate binding model, four substrate binding sites were predicted at positions essential for the conformational changes required to recognize the substrate. However, since previously determined RmSBP at pH 4.5 exhibited a highly rigid structure and a low B-factor value, there was no experimental evidence on whether the actual RmSBP was structurally flexible. To observe the structural flexibility of RmSBP, we crystallized and determined three crystal structures of RmSBP through three new crystallization conditions that had not been reported previously. The structural flexibility of β1–α2 loop, β5–β6 loop, and C-terminal helix region of RmSBP might provide the initial framework for structural studies on short length SBP, as well as other SBPs.

The N-and C-terminal domains of SpSBP and VcSBP have an α/β fold in common (Fig. 4a). Therefore, we superimposed the structures to determine the similarity between the α/β fold of RmSBP and the N-terminal domain of SpSBP and VcSBP (Fig. 4b). However, unlike the C-terminal domain of VsSBP, no similarity was found in the N-terminal domain (Fig. 4b). These results suggest that there is no amino acid sequence similarity; of note, it was difficult to find structural similarity due to differences in the length and direction of the β-stand and the length of the helix in α/β fold. On the other hand, when the structure of RmSBP was superimposed on the N-and C-terminal domains of VsSBP, molecular overlap occurred between the two RmSBPs in dimeric formation (Fig. 4c). This said, it is considered that even if the RmSBP exists as a homodimer, it will have a different conformation from those of existing SBPs; therefore, large structural changes will be required for substrate recognition. In fact, RmSBP showed structural similarity to the C-terminal domains of both SpSBP and VcSBP. In their crystal structures, SpSBP and VcSBP are complexed with tryptophan and phenylalanine, respectively; of note, these amino acids interact with Asn and Ile at the same position in the structure. Meanwhile, in RmSBP, Glu107 and Ser109 were placed in the same positions. As a result, we expect that even if RmSBP recognizes amino acids, they will be different types of amino acid substrates. Here, to investigate whether RmSBP was able to recognize amino acids, the RmSBP crystal was soaked in a cryoprotectant solution containing an amino acid mixture and diffraction data was collected. However, no electron density map, suggestive of an amino acid, was found at the RmSBP potential substrate-binding. This result indicates that RmSBP does not recognize amino acids or does not have a high affinity for them; other partner molecules may be needed for the recognition of amino acids. Of note, the amino acid binding sites of SpSBP and VcSBP have a hydrophobic surface charge, whereas the potential substrate-binding site of RmSBP is a negatively charged surface. Therefore, these data suggest that RmSBP recognizes a charged substrate distinct from those recognized by SpSBP or VcSBP.

Fig. 4
figure 4

Comparison and superimposition of RmSBP and VcSBP. a Ribbon representation of VcSBP consisting of two α/β domains. b Superimposition of two RmSBPs to N- and C-terminal α/β domain of VsSBP. c Dimeric model of RmSBP. Two RmSBP molecules superimposed to N- and C-terminal domains of VcSBP. The molecule overlapped region in dimeric RmSBP model is indicated by dot-circle

Although our study provides important information for understanding the structural properties of a short length SBP, further biochemical experiments on a variety of potential SBP substrates need to be performed to understand their exact biological functions. In this regard, not only identifying the substrates for RmSBP proteins, but also an in-depth study of their partner proteins and their functional relevance with the MCP-bound RmSBP, is required. The crystal structures of RmSBP, in this study, will definitely differ from that of RmSBP at 65 °C. To better understand the molecular flexibility of RmSBP, we believe that the equilibrium molecular dynamics simulations at growth temperatures are definitely worthwhile. Of note, RmSBP shows sequence similarity of 20% with the ORF1ab polyprotein of SARS-CoV-2 (strain SARS-CoV-2_HKU-SZ-001_2020), and was, therefore, recently used as a template for modeling structures [21]. Although the sequence similarity of amino acids is not high between RmSBP and ORF1ab polyprotein in SARS-CoV-2, our structural results for RmSBP will help to understand the model structure of ORF1ab of SARS-CoV-2.