Biological context

A novel coronavirus (SARS-CoV-2) that causes the disease Coronavirus Disease 2019 (COVID-19) emerged in a seafood and poultry market in the Chinese city of Wuhan in 2019 (Li et al. 2020). Cases have been detected in most countries worldwide, and on March 11, 2020, the World Health Organization characterized the outbreak as a pandemic. Other coronaviruses that have plagued humankind till these days are namely, SARS-CoV (identified in 2003) and MERS-CoV (Middle East Respiratory Syndrome; first reported in Saudi Arabia in 2012). Since these three viruses belong to the same family, they share significant similarities including giving rise to severe symptoms and having high pathogenicity in humans. However, despite the structural similarities and other common features in pathogenicity, studies have identified some differences inter alia in the spike protein (S), which is believed to be of immense importance for the increased efficiency in SARS-CoV-2 transmission and spread (Ou et al. 2020). Therefore, it is important to determine the protein structure differences among these coronaviruses in an attempt to elucidate the mechanisms and the factors that induce virulence of each individual virus.

Among the protein domains that are common in SARS viruses are, the so-called, SARS Unique Domains (SUDs), first identified in SARS-CoV. The polypeptide that includes these SUDs domains is part of the non-structural protein 3 (nsP3), and it is named nsP3c. It consists of three separate domains, SUD-N that is located at the N-terminal of SUD, SUD-M (middle domain) and SUD-C that is the smallest of the three (Tan et al. 2007; Johnson et al. 2010; Serrano et al. 2009; Kusov et al. 2015; Burrell et al. 2017; Lei et al. 2018). These three SARS-CoV-2 domains share sequence identity of 68.57%, 81.6% & 73.44%, respectively, with SARS-CoV SUD N, M and C domains (Fig. 1).

Fig. 1
figure 1

Sequence alignments of the SUD-M and SUD-C domains the SARS-CoV and SARS-CoV-2. Amino acid numbering is according to the sequence of the multi-domain non-structural protein 3 c (nsP3c). The color coding is dark blue for conserved residues, light blue for conserved type of residues and white for non-conserved residues

SUD-N and SUD-M exhibit a macro-like folding, α/β/α sandwich fold consisting of ~ 120–140 amino acids. According to the literature, they have a greater affinity for oligonucleotides instead of binding ADP-ribose (ADPr) and they lack the capacity of macro domains to hydrolyze attached ADPr molecules as well as their potential inability to de-MARylate substrates (Tan et al. 2009; Alhammad et al. 2020). Unlike the two previous domains, SUD-C is the shortest in length domain (~ 60–70 amino acids) and has a frataxin-like or a double-wing motif α/β fold, consisting of five antiparallel β sheets, packed against two α helices (Johnson et al. 2010; Chatterjee et al. 2009; Tan et al. 2009). The proposed function of SUD-M and SUD-C is that of binding of G-quadruplex forming RNA (Hammond et al. 2017). It has also been reported that SUD-C from bat coronavirus has DNA and metal ion-binding properties (Staup et al. 2019). Specifically, SUD-M, as a single domain, has been reported to bind (GGGA)2 and (GGGA)5 as well as (GGGA)2GG while SUD-MC, as a double domain, only binds to (GGGA)2GG but not (GGGA)2 or (GGGA)5, suggesting SUD-C might play a role in tuning the selectivity of binding of SARS Unique Domain (Johnson et al. 2010). Moreover, in vivo experiments shed light on SUD-M as an essential domain for the replication of the viral genome, in contrast to SUD-N and SUD-C, which are nonessential for virus genome replication (Kusov et al. 2015). One of the reported features of SUDs is the interaction with host proteins, like RCHY1, which is an E3 ligase that regulates the function of p53 protein, which might have an antiviral role (Ma-Lauer et al. 2016). These interactions might also be crucial for the acute symptoms experienced by the individuals infected with the virus. In addition, a recent study demonstrated that SUD-MC interacts with specific cellular components, affecting the pulmonary inflammation (Chang et al. 2020). Conformational dynamics and interaction properties of SUDs may be of great interest for the detailed functional characterization of the viral components and/or the discovery or the identification of new lead compounds that bind to these proteins in the quest for new antiviral drugs.

We report herein the complete backbone and side chains chemical shift assignments of the SARS-CoV-2 SUD-M and SUD-C (spanning the residues 551–675 and 680–743; according to nsP3 numbering, respectively). These data can be exploited for the elucidation at the atomic level of the structure, dynamics and interaction of these domains with a library of chemical compounds with potential antiviral properties.

Methods and experiments

Construct design

The coding sequences of the SUD-M domain (551-675 of nsP3) and SUD-C domain (680-743 of nsP3) were amplified using primers (fwd: 5′ GAATTCCATATGGGTACCGTGAGCTGGAAC 3′ and rev: 5′ CCGCTCGAGTTATTAGCTGCTGGTCAG 3′) and (fwd: 5′ CGCGGATCCGAGGAACACTTCATCG 3′ and rev: 5′ CCGCTCGAGTTATTAGCTCAGCAGGG 3′, respectively. cDNA sequence encoding nsP3 residues 201–745 (GenBank entry: MT066156.1- nucleotide numbering of the whole genome 3319–4954- GenBank entry: QIA98553 orf1ab—protein numbering of 1019–1563) was used to design the primers. This sequence was synthesized, and codon optimized for expression in Escherichia coli, by GenScript, (Piscataway, NJ). SARS-CoV-2 SUD-M coding sequence was cloned into pET28a(+) expression vector, containing an N-terminal His-tag followed by a thrombin cleavage site. The produced protein contained four artificial N-terminal residues (GSHM) preceding the native protein sequence. The SARS-CoV-2 SUD-C coding sequence was cloned into pGEX4T-1 expression vector, containing an N-terminal GST-tag followed by a thrombin cleavage site. The produced protein contained two artificial N-terminal residues (GS) preceding the native protein sequence.

Protein expression and uniform 15N and 15N/13C labeling

For the expression of SARS UNIQUE DOMAIN M (SUD-M) and SARS UNIQUE DOMAIN C (SUD-C), in 0.5 L M9 culture (40 mM Na2HPO4, 22 mM KH2PO4, 8 mM NaCl) containing 0.5 g 15N labeled NH4Cl, 2 g unlabeled or 13C d-glucose, 1 mL from a solution containing 0.5 mg/L biotin, 0.5 mg/L thiamin, 0.5 mL 1 M Mg2SO4, 0.15 mL 1 M CaCl2, 1 mL solution Q (40 mM HCl, 50 mg/L FeCl2·4H2O, 184 mg/L CaCl2·2H2O, 64 mg/L H3BO3, 18 mg/L CoCl2·6H2O, 4 mg/L CuCl2·2H2O, 340 mg/L ZnCl2, 605 mg/L Na2MoO4·2H2O, 40 mg/L MnCl2·4H2O), and 1 mg/L of kanamycin (for SUD-M) and 1 mg/L of ampicillin (for SUD-C), an LB preculture that was inoculated with BL21 (DE3) E. coli cells transformed with the above mentioned plasmid (that was grown overnight at 37 °C, 180 rpm) was added. The culture was incubated in 37 °C, 180 rpm until the OD600 was between 0.6 and 0.8, then IPTG was added to final concentration of 1 mM and the culture incubated overnight (16 h) at 18 °C.

Protein purification performed according to standard protocols and details will be published elsewhere. The final NMR samples (concentration 0.9 mM for SUD-M and 0.7 mM for SUD-C) were prepared by adding 10% D2O and 0.25 mM DSS.

Data acquisition, processing and assignment

Protein NMR samples for SUD-M and SUD-C domains were prepared in 500 μL buffer at pH 7.2 containing 50 mM NaPi, 50 mM NaCl, 10% D2O, 2 mM DTT, 2 mM EDTA, 2 mM NaN3, bacterial inhibitor cocktail (Sigma Aldrich®) and 0.25 mM DSS (4,4-dimethyl-4-silapentane-1-sulfonic acid) as internal 1H chemical shift standard. 13C and 15N chemical shifts were referenced indirectly to the 1H standard using a conversion factor derived from the ratio of NMR frequencies (Wishart et al. 1995). The protein concentration in the NMR sample was 0.9 mM for SUD-M and 0.7 mM for SUD-C. All NMR experiments were recorded at 298 K on a Bruker Avance III High-Definition four-channel 700 MHz NMR spectrometer equipped with a cryogenically cooled 5 mm 1H/13C/15N/D Z-gradient probe (TCI). The acquired NMR experiments used for sequence specific assignment are summarized in Table 1. Backbone and sidechains assignments for both SARS-CoV-2 SUD-M and SUD-C domains were obtained from the following series of heteronuclear experiments: 2D [1H,15N]-HSQC and 2D [1H,15N]-TROSY, 3D HN(CO)CA, 3D HNCA, 3D TROSY HN(CO)CACB, 3D TROSY HNCACB, 3D HN(CA)CO, 3D HNCO, 3D HNHA, 3D HBHA(CO)NH, aliphatic 3D (H)CCH TOCSY, 2D [1,13C]-HSQC and 3D 15N-edited NOESY (Table 1). We also performed CBCA(CO)NH selective experiments in order to help the identification of residues without CG and residues such as Ala, Cys and Ser (Table 1). All NMR data were processed with TOPSPIN 4.0.6 and analyzed with CARA 1.9.2a4 (Keller 2004).

Table 1 List of NMR experiments acquired, including the main parameters used, to perform the sequence specific assignment of the backbone and sidechains nsP3c SUD-M and SUD-C domains

Extent of assignments and data deposition

The 2D 1H,15N-HSQC spectrum shows well-dispersed amide signals as shown in Fig. 2 for SUD-M and in Fig. 3 for SUD-C, respectively. For nsP3c SUD-M we assigned 98.6% of the resonances of the backbone atoms (HN, N, H α, CO, Cα and Cβ) and about 70% of all the atom of side chains including also the aromatic rings. The only unassigned HN and N resonances of nsP3c SUD-M belong to Ala579, Gly590 and the N of Pro572, Pro631, Pro654, Pro662 (not possible to assign following classic triple resonance proton detected experiments). Gly590 and the prolines are part of the loop regions or part of unstructured regions as shown in Fig. 4, instead Ala579 is positioned at the beginning of an α -helix. For nsP3c SUD-C we assigned 99.1% of the resonances of the backbone atoms (HN, N, Hα, CO, Cα and Cβ) and 80% of all the atom of side chains including the aromatic rings. The only unassigned HN and N resonances of nsP3c SUD-C are belonging to Gln704 and to the backbone nitrogen of Pro723, which are part of the loop regions or part of less structured regions as shown in Fig. 5.

Fig. 2
figure 2

700 MHz 1H,15N-HSQC assigned spectrum of the 0.9 mM 13C,15N-labelled SARS-CoV-2 SUD-M nsP3c in 50 mM NaPi pH 7.2, 50 mM NaCl, 2 mM EDTA, 2 mM DTT, 0.25 mM DSS and 10% D2O acquired at 298 K. Amino acid numbering is according to the sequence of the multi-domain non-structural protein 3c (nsP3c)

Fig. 3
figure 3

700 MHz 1H,15N-HSQC assigned spectrum of the 0.7 mM 13C,15N-labelled SARS-CoV-2 SUD-C nsP3c in 50 mM NaPi pH 7.2, 50 mM NaCl, 2 mM EDTA, 2 mM DTT, 0.25 mM DSS and 10% D2O acquired at 298 K. Amino acid numbering is according to the sequence of the multi-domain non-structural protein 3 c (nsP3c)

Fig. 4
figure 4

Predicted secondary structure of SARS-CoV-2 SUD-M nsP3c using TALOS+

Fig. 5
figure 5

Predicted secondary structure of SARS-CoV-2 SUD-C nsP3c using TALOS+

Secondary structure prediction for both SUD domains (M and C) were performed using chemical shift assignments of five atoms (HN, H α, Cα, Cβ, CO, N) for each residue in the sequence using TALOS+ (Shen et al. 2009). The secondary structure elements for SUD-M protein (125 a.a.) are organized in the following order from N- to C-terminus: β/α/β/α/β/β/α/β/α/α/β/α (Fig. 4). The order of the secondary structure segments is very similar to that of the nsP3b protein (Cantini et al 2020) and to SUD-N domain of nsP3c, beside two extra β strands and an α -helix secondary structure elements. This domain has also high secondary structure identity in comparison with SUD-M domain from SARS-CoV. The secondary structure elements for SUD-C protein (64 a.a.) are organized in the following order from N- to C-terminus: α/β/β/β/β/α (Fig. 5). This domain has high secondary structure identity in comparison with SUD-C domain from SARS-CoV previously characterized and its secondary structure folding is very similar (Johnson et al. 2010).

Chemical shift values for the 1H, 13C and 15N resonances of SARS-CoV-2 nsP3c SUD-M and SUD-C have been deposited at the BioMagResBank (https://www.bmrb.wisc.edu) under accession numbers 50516 and 50517, respectively.