Introduction

Carbonic anhydrases (CAs) are the first zinc-containing metalloenzyme to be identified. They are broadly conserved and have a variety of physiological roles, including the reversible hydration of carbon dioxide with bicarbonate (CO2 + H2O ↔ HCO3 + H+) (Meldrum and Roughton 1933). According to the amino acid sequence and structure similarity, CAs have been classified into six distinct families (α, β, γ, δ, ζ, and η) in various organisms (Herrou and Crosson 2013; Anthony et al. 2004). Among them, α, β, and γ are the major families. Family η was first proposed in 2014 (Prete et al. 2014). To date, CAs belonging to various families have been cloned and purified, and their crystal structures have been determined, including for animals, plants, bacteria, fungi, and worms (Joseph et al. 2010; Nishimori et al. 2006, 2009). All members of CAs catalyze the same chemical transformation with a divalent-metal-containing active site, although the structures of these enzymes are distinct (Tripp et al. 2001; Smith et al. 1999; Esbaugh and Tufts 2006).

The γ-CAs mainly come from methanogens methane-producing bacteria and other ancient species that grow in hot springs. Sequence-homology studies involving CAs from bacteria to archaea show that the γ-CAs are likely to be the prototype of all carbonic anhydrases. Furthermore, phylogenic studies suggest that the date of evolution of the γ class is near the origin of life (Smith et al. 1999). There have been more than 20 crystal structures of γ-CA solved since 1996 (Kisker et al. 1996). The overall fold of γ-CAs is a regular prism: Every monomer comprises a triangular prism like structure composed of a left-handed, seven-stranded parallel β-helix with a long α-helix appended to the C-terminus and positioned antiparallel to the axis of the β-helix (Herrou and Crosson 2013). Three monomers assemble into a trimer with three Zn2+-containing active sites located at the monomer–monomer interface; Zn2+ is coordinated by three histidines from two adjacent monomers (Ferry 2010; Smith and Ferry 2000). Each structure has its own characteristics, although the overall structures are almost the same. Structural diversity lies on four common places: the N-terminus, C-terminus, β1–β2 loop, and β10–β11 loop.

To date, only a few γ-CAs have been shown to have carbonic anhydrase activity in vitro, including γ-CAs from Methanosarcina thermophila (Zimmerman et al. 2010; Alber and Ferry 1994, 1996), Porphyromonas gingivalis (activated by oxidizing agent) (Pena et al. 2010), Burkholderia pseudomallei (activated by amino acids and amines) (Vullo et al. 2017), and Porphyromonas gingivalis (Del et al. 2013). RicA from Brucella abortus Metalloprotein acts as a Rab2-Binding Virulence Effector (Herrou and Crosson 2013); the YrdA and Cap (γ-CA related proteins) (Park et al. 2012) of Escherichia coli have no enzymatic activity despite significant homology to Methanosarcina thermophile γ-CAs. The molecular basis of this difference in the activities of these closely related proteins remains unclear. The function of γ-CAs still remains largely undefined beyond the chemical transformation of carbon dioxide and bicarbonate. This needs much more study.

Thermus thermophilus HB8 is a Gram-negative extreme thermophilic bacterium that can grow at temperatures ranging from 50 to 82 °C (Oshima and Imahori 1971, 1974). It has a small genome (2.1 M base pairs) but still has all the essential genes for growth (Ohtani et al. 2010). Thermus thermophilus HB8 is a promising model organism for whole cell projects (Yokoyama et al. 2000) and is suitable for physicochemical characterization including X-ray crystallographic analysis (Iino et al. 2008) because its proteins are thermostable and genetic tools for functional analysis are readily available.

In this study, we report the crystal structure of γ-TtCA at 2.3 Å resolution in space group P1. The asymmetric unit contains two trimers and six catalytic Zn2+ ions. Not surprisingly, γ-TtCA adopts the typical fold of γ-CAs consisting of a left-handed β-helix and a C-terminal α-helix. The Zn2+ coordinates with three histidines and a phosphate radical in a tetrahedral fashion at the interface between the two monomers. Compared to other γ-CAs structures, the γ-TtCA revealed some differences in the conformations of several regions especially the replacement of half C-terminal α-helix by a unique novel loop. Purified γ-TtCA exhibits no significant carbonic anhydrase activity compared to α-class carbonic anhydrases. This work offers insight into the structural diversity of γ-CAs with potential function for γ-CAs.

Materials and methods

Strains, plasmids, and chemicals

The Escherichia coli strains used in this study were purchased from TransGen Biotech (Beijing, China). Sangon Biotech (Shanghai, China) provided the primers, SanPrep Column DNA Gel Extraction Kit (B518131-0100), SanPrep Column Plasmid Mini-Preps Kit (B518191-0100), pfu DNA polymerase, restriction enzymes, T4 DNA ligase, and all analytical grade chemicals. The GST affinity column was purchased from GE Healthcare (Beijing, China). Codon-optimized cDNAs for full-length γ-TtCA were synthesized in GENEWIZ (Suzhou, China).

Sequence analyses

The γ-TtCA sequence was retrieved from NCBI (strain YP_145145.1). Sequence alignments, secondary structures, and disorder predictions were performed using default parameters of program ClustalX (Larkin et al. 2007) and the Phyre webserver (Kelley and Sternberg 2009).

Cloning, expression and purification

The full-length γ-TtCA and γ-TtCAΔ169 (residues 1–169) gene was PCR-amplified from the synthesized DNA with primers 5′-CGGGGATCCATGAGCGTGTATCGCTTTG-3′ (forward primer), 5′-GGCCTCGAGTGCCACCGGAAACAGTGC-3′ (reverse primer for γ-TtCAΔ169), and 5′-GGCCTCGAGTTACTCCGGGGCCAGCAG-3′ (reverse primer for full-length γ-TtCA). The amplified DNA fragment was digested with BamHI and XhoI. They were then cloned into the vector pGEX-6p-1(GE Healthcare). The accuracy of the inserts was verified by sequencing.

The recombinant plasmid of γ-TtCA and γ-TtCAΔ169 were transformed into Escherichia coli strain BL21 (DE3) and overexpressed as a glutathione S-transferase (GST) fusion protein. The cells were cultured at 37 °C in 800 mL LB medium containing 100 μg/mL ampicillin. Once the optical density at 600 nm (OD600) reached 0.7, the protein was induced by incubation with 0.2 mM isopropyl-β-D-1-thiogalactopyranoside (IPTG) for an additional 4.5 h. Harvested cells were resuspended in lysis buffer containing 50 mM Hepes-HCl (pH 6.5) and 200 mM NaCl, 5% (vol/vol) glycerol followed by sonication at 277 K. This was collected by ultracentrifugation for 20 min at 40,000×g at 277 K to remove the cell debris. The supernatant was then loaded twice onto a GST column pre-equilibrated with lysis buffer, and the GST tag was removed by digestion with PreScission protease (GE Healthcare) overnight at 277 K. The SDS PAGE analysis was performed to reveal the expression of γ-TtCA and γ-TtCAΔ169. The eluted γ-TtCAΔ169 was further purified by HiTrap™ Q HP column (GE Healthcare) and Superdex-200 gel filtration chromatography (GE Healthcare). The purified protein was then concentrated to 6 mg/mL in a buffer containing 50 mM Hepes-HCl (pH 6.5) and 200 mM NaCl and judged by SDS-PAGE.

Crystallization and X-ray data collection

The γ-TtCAΔ169 proteins were stored in the solutions described above, and initial crystallization conditions were screened by the hanging-drop vapor-diffusion method using commercial crystal screening kits at 18 °C. Crystals were obtained by mixing 1.5 μL of the protein solution with an equal volume of a reservoir solution and equilibrating the mixed drop against 300 μL of reservoir solution. Small crystals of γ-TtCAΔ169 first appeared after 2 days in 200 mM trisodium orthophosphate (pH 9.1) and 20% (wt/vol) PEG 3350. The final optimized crystals were grown in 200 mM tripotassium orthophosphate (pH 8.9), 19% (wt/vol) PEG 3350, and 5% (vol/vol) glycerol. Crystals were harvested and cryoprotected in the well solution containing an additional 20% (vol/vol) glycerol and cooled in a dry nitrogen stream at 100 K for X-ray data collection. The data set was collected at 2.3 Å using a wavelength 1 Å at the BL18U1 beamline in Shanghai Synchrotron Radiation Facility. All data sets were indexed, integrated, and scaled using the HKL2000 package (Otwinowski and Minor 1997). The triclinic crystal form occupied the space group P1 with cell parameters a = 48.7 Å, b = 69.2 Å, c = 83.3Å, α = 75.2°, β = 74.4°, and γ = 89.2°. The initial phases were calculated by Phaser in the PHENIX program (Adams et al. 2010) with Rica from Brucella Abortus (PDB code: 4N27) as the starting model. All the models were manually built into the modified experimental electron density using COOT (Emsley and Cowtan 2004) and further refined in PHENIX (Adams et al. 2010). Model geometry was verified using the program PROCHECK (Laskowski et al. 1993). Structural figures were drawn using PyMOL (Delano 2009).

Inductively coupled plasma mass spectrometry

To assay for the presence of zinc in γ-TtCAΔ169, inductively coupled plasma mass spectrometry (ICP-MS) was performed on NexIONTM350. Protein γ-TtCAΔ169 at different concentrations was dialyzed in pure water. Finally, the concentration of zinc was calculated.

Carbonic anhydrase activity assay

The γ-TtCAΔ169 was assayed for CA activity using a colorimetric method adapted from the Wilbur–Anderson assay (Wilbur and Anderson 1948; Khalifah 1971). The details of this assay have been described before (Herrou and Crosson 2013). The γ-TtCAΔ169 (final concentration 5 μM) was mixed with 500 μL of a Tris-NaCl phenol red buffer (20 mM Tris, pH 8.8, 150 mM NaCl, 1 mM ZnCl2, and 200 μM phenol red), and 500 μL of water saturated with CO2 using dry ice in a 1 cm cuvette. After the addition of CO2-saturated water, the pH decrease was indirectly monitored at 558 nm by immediately measuring the red-to-yellow color shift. In contrast, dialysis buffer alone and bovine carbonic anhydrase (final concentration 5 nM, Sigma-Aldrich) were also studied as a positive control. The (Wilbur–Anderson units, WAU) per milligram activities were tested at two different temperatures (4 °C, 10 °C), and the equation WAU = (t0 − t)/t was used to calculate the activity, where t0 and t are the times required for uncatalyzed (buffer control) and catalyzed (with enzyme) reactions to drop to the transition point of the dye, respectively (Wilbur and Anderson 1948).

Results and discussion

Sequence analysis and structure prediction

The 690 bp coding sequence of γ-TtCA encoded a polypeptide of 230 amino acids with a calculated molecular mass of 24.3 kDa. Multiple sequence alignments with many solved CA structures in the RCSB revealed that γ-TtCA has the highest homology with Rica from Brucella Abortus (4N27, up to 46%) (Herrou and Crosson 2013). Sequence identity with other CAs are as follows: 32%, 1QRM (Methanosarcina thermophile) (Iverson et al. 2000); 37%, 1XHD (Bacillus Cereus); 43%, 1V67 (Pyrococcus horikoshii) (Rangarajan et al. 2008); 37%, 3IXC (Anaplasma phagocytophilum); 32%, 3KWC (Thermosynechococcus elongates) (Pena et al. 2010); 37%, 3R3R (Salmonella enterica); 35%, 3TIO (Escherichia coli) (Park et al. 2012); 40%, 4MFG (Clostridiales). Sequence alignment shows that γ-TtCA has another C-terminal domain than other proteins in the carbonic anhydrase domain. This motif comprises about 60 amino acids (Figs. 1, 2a). Therefore, structure prediction was performed on a Phyre webserver, and the results suggest that the C-terminal domain is composed of two α-helices including a long helix (up to 40 amino acids) and a short helix; these are disordered or not as stable as predicted (Fig. 2b).

Fig. 1
figure 1

Multiple sequence alignment of γ-TtCA with other γ-CAs. a Sequence alignment of the γ-TtCA (PDB code: 6IVE), 3R3R, 3TIO, 3IXC, 1XHD, 4MFG, 4N27, 1V67, 2FKO, 1QRM and 3KWC. The strictly conserved residues are shown in white text with red background, while the conserved residues are represented as red text with blue frames. The residues involved in Zinc ion binding are indicated by blue triangles, and other residues participated in the active center of are also indicated by red dots. The red dash frames denote three loops, including β1–β2 loop, β9–β10 loop and the unique loop. The sequence number and the secondary structure elements corresponding to γ-TtCA on the top of the figure. (Color figure online)

Fig. 2
figure 2

a Schematic diagram of the construction of γ-TtCA. Sequence alignments suggested carbonic anhydrase domian and C-ter domain structures are in blue and red, respectively. The residue numbers for each end are at the top. b The secondary structure and disorder prediction results of γ-TtCA C-ter domain performed by Phyre webserver23. The disorder region are indicated by black question marks. (Color figure online)

Expression, purification, and characterization of γ-TtCA

The full-length γ-TtCA and γ-TtCAΔ169 was expressed based on the structure prediction; the gene products were purified for functional characterization. The full-length γ-TtCA presented clear degradation, and the molecular weight of full-length γ-TtCA is almost the same as γ-that of TtCAΔ169 as judged by SDS-PAGE after 1 day of storage. Therefore, only γ-TtCAΔ169 was successfully expressed, purified, and crystallized.

Overall crystal structure of γ-TtCAΔ169

Protein γ-TtCAΔ169 was crystalized in the P1 space group. The structure of γ-TtCAΔ169 at a 2.3 Å resolution was determined via the molecular replacement method using the structure of Rica from Brucella Abortus (PDB code: 4N27) as a probe model. Six molecules of γ-TtCAΔ169 are contained within the asymmetric unit and are organized as two homotrimers (Fig. 3a, b). There are no major structural differences between the two homotrimers in the asymmetric unit in the final model of γ-TtCAΔ169. The final crystallographic data and refinement statistics are summarized in Table 1. Each monomer contains 168 amino acids. Met1 and five more amino acids (Gly-Pro-Leu-Gly-Ser) from the vector are not visible in the electron density map (Fig. 3a). Three additional zinc ions and phosphate radical molecules (Fig. 3b) were also observed in each homotrimer. The monomer adopts the typical fold of γ-CAs consisting of a left-handed β-helix and a short C-terminal α-helix (Fig. 3a, b). The strong hydrogen bonding by the intervening loops between the β-strands of the prisms and hydrophilic/hydrophobic interactions aided by the C-terminal α-helix are the main forces maintaining the homotrimer structure.

Fig. 3
figure 3

The crystal structure of γ-TtCAΔ169a Structure of the γ-TtCAΔ169 monomer, with secondary structure labeled. The red dot and blue dot denote the N-ter and C-ter of γ-TtCAΔ169. b The γ-TtCAΔ169 trimer viewed down the threefold axis. The zinc binding histidine residues, as well as phosphate radical molecule, are shown in sticks. The zinc ion is shown as a red sphere. c Zoomed view of the active site: residues H65 and H87 (from molecule C, colored in magenta) and residue H82 (from molecule A, colored in green) are involved in coordination of the zinc ion (red sphere). Four water molecules and phosphate radical molecule near the zinc site are highlighted in cyan and brown. Distances between the zinc ion, side chains of related amino acid and water are also labeled d Closed conformation. The bound zinc ion (Zn; red ball) is coordinated by three N atoms of three independent His residues and oxygen atom from phosphate radical molecule. Nitrogen-atoms are represented by blue sticks. The 2Fo-Fc electron density for the coordinating His residues, the bound Zn2+ and phosphate radical molecule at the active site are contoured at 1.5σ. (Color figure online)

Table 1 Data collection and refinement statistics for γ-TtCAΔ169

All γ-CA require metals (Zn2+ or Fe2+) coordinating at the interfaces of two protomers. In the crystal structure of γ-TtCAΔ169, ten more amino acids, some water molecules, and Zn2+ constitute the active site. These form the complicated hydrogen bonded network (Fig. 3c). The Zn2+ is coordinated by three histidines (H65 and H87 from one monomer and H82 from the adjacent monomer) and a phosphate radical molecule; strong electron densities were observed (Fig. 3d). The locations of the Zn2+ ions in γ-TtCAΔ169 coincide with those in Cam and structurally related proteins. The identity of this bound metal ion as zinc is consistent with the ICP-MS data. The content of the three different divalent metal ions (Fe2+, Mg2+ and Zn2+) in γ-TtCAΔ169 protein solution was tested, and the content of Zn2+ is much higher than the others (Fig. 4). The concentration of zinc ion is 0.022 μg/μL in protein γ-TtCAΔ169 (6 mg/mL) as calculated by the molar ratio of zinc ions: monomer of nearly 1. Water molecules are crucial for carbonic anhydrase catalytic activity, and water molecules have been described in some gamma carbonic anhydrase structures including Cap (Rangarajan et al. 2008), Cam (Kisker et al. 1996; Iverson et al. 2000), CcmM (Pena et al. 2010), etc. Not surprisingly, in this structure, there are some water molecules in the vicinity of zinc ion, but these water molecules are not coordinated with Zn2+ because phosphate radical molecules occupy the position normally occupied by water molecules (Fig. 3c).

Fig. 4
figure 4

The content of Zinc ion in γ-TtCAΔ169 is determined by ICP-MS. The concentration of Zinc ion is 0.022 μg/μL in protein γ-TtCAΔ169 (6 mg/mL), in mole ratio of 1.024:1

Carbonic anhydrase activity of γ-TtCAΔ169

The carbonic anhydrase activity of γ-TtCAΔ169 was tested at two different temperatures (4 °C, 10 °C) (Fig. 5a, b). An activity of 0.6 and 0.9 WAU was measured. The same experiment with bovine carbonic anhydrase was performed as a control with an activity of 12,000 WAU at 4 °C and 14,000 WAU at 10 °C. Thus, γ-TtCAΔ169 exhibits almost no carbonic anhydrase activity. We know that there is a Zn2+-containing active site at the subunit interfaces, and this active site includes not only the zinc ion and three histidine coordinated with zinc ion but also some nearby amino acid residues (Glu, GLn or Thr) and some water molecules. All of these residues and molecules form a complicated hydrogen bond network to catalyze the interconversion of carbon dioxide and bicarbonate. The molecular basis for the differences in the activities of these closely related proteins remains unclear.

Fig. 5
figure 5

Carbonic anhydrase assay. CO2 hydration by γ-TtCAΔ169 (5 μM, 10 μM) was indirectly evaluated by measuring decrease in pH in a phenol red buffer at 4 °C and 10 °C. Change in absorbance at 558 nm was recorded (blue line for 4 °C, magenta line for 10 °C) and compared to negative (buffer alone, black line) and positive (5 nM bovine carbonic anhydrase, red line) controls. (Color figure online)

The distance of Zn2+ and the three coordinated histidine residues in different structures might be one of the possible factors that affect activity. A distance (Å) summary of Zn2+ and coordinated histidine residues in 12 γ-CA structures is given in Table 2. This table shows that the distance between Zn2+ and His is diverse and ranges between 2.0 and 2.4 Å. More than half of the distance is 2.1, particularly in 1THJ, 3TIO, 4MFG, and γ-TtCAΔ169. Although they come from different species, 1V67, 3IXC, and 3KWC share similar distances. The related literature shows that 1QRM, 1THJ, and 3KWC exhibit carbonic anhydrase activity, while others are unclear. To our surprise, the distance of these three active carbonic anhydrases are different from each other. Thus, it now appears that the distance between Zn2+ and coordinated histidine residues is likely not the key factor for carbonic anhydrase activity. Differences in activity are very complex and require additional study; however, this structure demonstrates the structural diversity in Zn2+ binding sites. Moreover, the α-class carbonic anhydrases mainly from mammalian sources, catalyze the reversible hydrolysis of esters. No esterase activity was detected and reported for other classes of carbonic anhydrases including γ-class. (Capasso et al. 2012; Tripp et al. 2001) Cam, the carbonic anhydrase from Methanosarcina thermophila, has high similarity with γ-TtCAΔ169 and ‘represents the prototype of γ-class of carbonic anhydrases, don’t show any esterase activity (Tripp et al. 2001; Alber and Ferry 1996). For γ-TtCAΔ169, the esterase activity was also tested before and no significant esterase activity was detected.

Table 2 Comparison of γ-TtCAΔ169 and other related γ-carbonic anhydrase

Comparison of γ-CAs reveals the diversity of structures

A comparison of γ-TtCAΔ169 structures reported in the Protein Data Bank (PDB) using the DALI structure comparison service (Holm and Sander 1995) revealed a high structural similarity with other γ-CAs. The overall root-mean-square deviation (r.m.s.d.) and sequence identity are listed in Table 2. After all of these molecules were superposed, we found that these structures adopt nearly the same fold pattern especially the prism-like domain. In the meantime, the sequence and structural variable regions are also observed at each terminus and some loops (Fig. 6).

Fig. 6
figure 6

Stereo presentation of monomeric γ-TtCAΔ169 superposed on γ-CAs and related structures. These superposed structures are displayed as cartoon diagrams with different colors and are labelled with their respective PDB codes using similar colors. Inset showing the four common variable regions. a N terminus, b C terminus, c β1–β2 loop and d β10–β11 loop. (Color figure online)

Four common variable regions in γ-CA structures have been reported, including the N-terminus, C-terminus, β1–β2 loop, and β10–β11 loop. In these regions, the conformation of protein γ-TtCAΔ169 is similar to one or more of known sequences. For the N-terminus, the γ-TtCAΔ169 shares a similar sequence length with 1V67, 1XHD, 2FKO, and 4MFG (Fig. 1a), while others have longer N terminus sequences. Met1-Phe6 form a β-strand in γ-TtCAΔ169 like most other γ-CAs, except 4MFG which, is an α-helix (Fig. 6a). The next variable region is C terminus, the C-terminal α-helix is followed by a β-strand (β21) in γ-TtCAΔ169 similar to 4N27, 1V67, and 2FKO, but there is a helix in 3KWC (Fig. 6b). This β21 also contributes to the formation of a trimer through hydrogen bonding with β1 from the adjacent monomer. Compared to these two regions noted above, variable regions β1–β2 loop and β10–β11 loop are more obvious. All γ-CAs structures, except 1QRM, adopt almost the same conformation in β1–β2 loop and are composed of six amino acids, while 1QRM and 3KWC display a much longer loop architecture (Figs. 1, 6c). Finally, the differences in the β10–β11 loop of these structures are obvious both in the sequence and the conformation (Figs. 1, 6d). Structures 1QRM, 3R3R, 3TIO, and 3KWC have a longer loop than others, even including two β-strands in β10–β11 loop. For γ-TtCAΔ169, the β10–β11 loop is almost the same as 1V67, 1XHD, 2FKO, 3IXC, 4MFG, and 4N27 regardless of the sequence or conformation. These results suggest that these four regions are volatile to change; it is very likely that these changes in structures are intimately related to the evolution of γ-CAs.

Although there are no obvious structural characteristics in γ-TtCAΔ169 at the four common variable regions mentioned above, we found a novel variable region in the C-terminal α-helix. This finding was very exciting and led to unexpected results. In general, the length of the C-terminal α-helix is about 34 Å, which is essentially equal to the height of the prism-like domain. However, in the structure of γ-TtCAΔ169, half of the C-terminal α-helix is substituted by a loop architecture with a length of 18 Å (Fig. 7a). Through sequences analysis in comparison with each other, we found that six amino acid residues (Asp-Asp-Tyr-Ala-Tyr-Ser) after Pro147 are missing (Fig. 1). This unique loop is composed of six residues (Ile-Asp-Pro-Pro-Gly-Asn), and both this unique loop and the short helix are well built in the structural model (Fig. 7b, c). Proline is a helix breaker due to the side-chain constraints and steric hindrance. In this unique loop, these two prolines completely destroy the C-terminal α-helix but make up for the overall structural integrity that is induced by the missing amino acid residues.

Fig. 7
figure 7

Details view of the unique loop and short C-terminal α-helix in γ-TtCAΔ169. a The length of the unique loop and short C-terminal α-helix are indicated. The 2Fo-Fc electron density for the unique loop (b) and C-terminal α-helix (c) are contoured at 1.5 σ. (Color figure online)

In conclusion, the γ-class carbonic anhydrases from Thermus thermophilus HB8 was successfully cloned, expressed, and purified; the crystal structure was also solved at 2.7 Å. The recombinant γ-TtCAΔ169 is a trimer in solution, and the folding patterns are classic. These are similar to other homologous enzymes containing a left-handed β-helix followed by a C-terminal α-helix. Concurrently, γ-TtCAΔ169 does not exhibit significant carbonic anhydrase activity. The γ-TtCAΔ169 also has its own characteristics, and we found a novel variable region in the C-terminal α-helix in addition to the four common variable regions.