Main

Women become more susceptible to malaria infection during pregnancy despite pre-existing immunity acquired from childhood, causing substantial risk of severe outcomes for the mother and her offspring1. Placental malaria is caused by the accumulation of Plasmodium falciparum-infected erythrocytes in the placenta of pregnant women, resulting in high rates of maternal anaemia, low birth weight, stillbirth and spontaneous pregnancy loss1,2,3. Each year, up to 200,000 infant deaths and 10,000 maternal deaths are attributed to malaria infection in pregnancy globally4,5. However, women naturally acquire resistance to placental malaria over successive pregnancies, providing a strong basis for the development of vaccines to prevent placental malaria6,7.

P. falciparum expresses a family of proteins, referred to as erythrocyte membrane protein 1 (PfEMP1), that are translocated to the surface of the infected erythrocyte to enable adherence to different host organs and to evade the host immune response8. The leading placental malaria vaccine candidate VAR2CSA is a member of the PfEMP1 family that specifically binds to the syncytiotrophoblast surface receptor chondroitin sulfate A (CSA), leading to placental malaria9,10. Due to its large size (including an ~310-kDa extracellular domain (Fig. 1a)), production of VAR2CSA protein for vaccine development and scientific study has proved to be challenging11. Furthermore, the highly polymorphic nature of the extracellular domain of VAR2CSA in parasite isolates may hinder the development of a strain-transcending vaccine12,13. Last, vaccine-induced and naturally acquired immunity may differ in important ways that need to be carefully examined.

Fig. 1: Overall structure of the CSA–VAR2CSA NF54 complex.
figure 1

a, Schematic of VAR2CSA NF54 primary structure coloured by domain. Domains that were excluded from the ectodomain expression construct or could not be visualized in the final map are coloured white. TM, transmembrane domain; ATS, acidic terminal sequence. The alignments of PRIMVAC, PAMVAC and rVAR2 polypeptide are indicated below. b, Left: two views of the cryo-EM density for the 3.36 Å core structure. Right: the same two views of the atomic model corresponding to the map. Each domain is coloured as in a. The CSA major and minor binding channels are highlighted by arrows. The CSA polymer in the major binding channel is coloured dark blue and the CSA monosaccharide in the minor binding channel magenta. c, Two views of the cryo-EM density for the entire complex with the model docked inside. The full-length density is the combination of the core and arm after local refinement. d, Schematic drawing of the CSA–VAR2CSA NF54 complex. Each line indicates interactions between the connecting domains. The major binding channel and minor binding channel are highlighted by the dark-blue hexagon and magenta triangle. IEM, infected erythrocyte membrane.

The ectodomain of VAR2CSA consists of an N-terminal sequence (NTS), six Duffy-binding-like (DBL) domains and three interdomain regions (IDs) organized as shown in Fig. 1a. ID2 is also referred to as the cysteine-rich interdomain region (CIDRPAM). Low-resolution small-angle X-ray scattering and negative-stain electron microscropy (EM) studies indicate that the ectodomain of VAR2CSA adopts a compact structure10,14,15. In vitro studies suggest that the segment comprising DBL1X to ID2a is sufficient to bind CSA and is considered the minimal CSA-binding region, although DBL3X and DBL6ɛ have also shown CSA-binding activity and the full-length ectodomain exhibits higher affinity than any region alone10,14,16,17,18,19,20,21,22,23. Two candidate vaccines, PRIMVAC and PAMVAC, derived from the VAR2CSA segments within DBL1X to ID2a, are currently in phase I/II clinical trials24,25,26 (Fig. 1a). Immunization with these VAR2CSA segments generates homologous inhibitory antibodies, but limited heterologous activity against disparate strains25,26. These findings highlight the need to develop a strain-transcending vaccine against placental malaria based on VAR2CSA.

Intriguingly, diverse cancer cells express and present the form of chondroitin sulfate that is typically found exclusively in the placenta, and a recombinant VAR2CSA fragment (rVAR2) conjugated to therapeutics could inhibit tumour cell growth in vivo27 (Fig. 1a). Thus, VAR2CSA has been used to develop platforms for cancer diagnosis and therapeutics27,28,29,30. Despite the importance of VAR2CSA in both malaria and cancer, two diseases of global importance, critical information has been lacking about the specific recognition mechanism for VAR2CSA binding CSA. In the present study, we present the cryo-EM structures of the full-length ectodomain of VAR2CSA in both ligand-binding and ligand-free states. The structures reveal that the CSA-binding sites reside in two binding channels within the core structure of VAR2CSA. This work elucidates the sequestration mechanism of placental malaria and has direct implications for development of malaria vaccines as well as cancer therapeutics and diagnostics.

Results

Overall structure of the CSA–VAR2CSA complex

We expressed VAR2CSA from parasite strain NF54 (VAR2CSA NF54) in Expi293 cells and purified it for the cryo-EM study of VAR2CSA in complex with CSA (Extended Data Fig. 1a,b); 6,196 videos were collected allowing for a 3.82 Å reconstruction of VAR2CSA NF54 in complex with CSA (Extended Data Fig. 2a). VAR2CSA NF54 exhibits an architecture comprising a stable core and a flexible arm (Fig. 1a–c). Local refinement of the core improved the resolution to 3.36 Å, and local refinement of the arm resulted in a 4.88 Å map (Extended Data Fig. 2a–f). A CSA dodecamer spans the core domain and binds in a channel termed the ‘major binding channel’ (Fig. 1b–d). Another potential binding site for CSA was observed in a second channel termed the ‘minor binding channel’ with weak density that could be modelled as a CSA monosaccharide (Fig. 1b,d). This binding of CSA polymer within channels of VAR2CSA is reminiscent of the binding model proposed for EBA-175 binding to glycophorin A during P. falciparum invasion of erythrocytes, where the glycophorin A receptor feeds through channels created by EBA-175 (ref. 31). EBA-175 is a protein related to VAR2CSA that belongs to the erythrocyte binding-like (EBL) family involved in the recognition of sialic acid on erythrocyte glycoproteins during erythrocyte invasion by P. falciparum32,33,34. Our final model for VAR2CSA NF54 spans residues 32–2,607 of VAR2CSA NF54 with a few flexible loops and ID1 omitted because these segments were not ordered in the reconstruction (Fig. 1c, Extended Data Fig. 2g–i and Supplementary Table 1).

An interwoven domain architecture stabilizes VAR2CSA

VAR2CSA is primarily composed of α-helices and extensive loops that adopt an overall shape resembling the number 7 (Fig. 1c). CryoSPARC three-dimensional (3D) variability analysis confirms that the region composed of DBL2X to ID3 forms a relatively stable core, whereas DBL5ɛ–DBL6ɛ forms a flexible arm and DBL1X exhibits some structural flexibility (Supplementary Video 1). The six individual DBL domains of VAR2CSA adopt the classic DBL domain fold, consisting of an α-helical core decorated by extensive loops31,35,36,37 (Fig. 2a). The individual domains interact in an interwoven manner to stabilize the compact tertiary structure (Fig. 1c,d). DBL4ɛ, the most conserved DBL domain of the six38, is located at the centre of VAR2CSA, and unites the whole structure by directly interacting with all the other domains except DBL1X and DBL5ɛ (Fig. 1c,d). DBL1X and DBL5ɛ are connected to DBL4ɛ via the NTS and ID3, respectively (Fig. 1c,d). The NTS (residues 32–49) is a twisted loop surrounding DBL1X and serves as the mortar holding DBL1X and DBL4ɛ together, with high conservation among diverse VAR2CSA strains (Fig. 2a–c). ID3 is a long helix that closely interacts with ID2 and connects DBL5ɛ with the core (Fig. 2a,d). A total of 31 pairs of disulfide bridges was identified in the final model (Fig. 2e and Supplementary Table 2).

Fig. 2: Domain composition of VAR2CSA.
figure 2

a, The models of the NTS, six DBL domains and two ID regions are shown according to the order of the protein sequence. Each domain is coloured according to Fig. 1a. b, NTS unites DBL1X and DBL4ɛ. NTS is shown as a surface whereas DBL1X and DBL4ɛ are shown as a ribbon. All the domains are coloured according to Fig. 1a. c, Sequence alignment of NTS among different VAR2CSA variants. The range of the final model of the NTS is highlighted by the yellow bar above the sequences. d, ID3 is an α-helix that connects DBL4ɛ and DBL5ɛ. ID3 is shown as a surface whereas the rest of the molecule is shown as a ribbon. All the domains are coloured according to Fig. 1a. e, The models of VAR2CSA NF54 are coloured grey, shown in two different views. The disulfide bonds are shown as yellow spheres.

Structural conservation within the EBL and PfEMP1 families

The VAR2CSA structure represents the first characterized structure of a full-length PfEMP1 protein, and provides the first structural models for DBL1X, DBL2X, ID2a, ID2b, ID3 and DBL5ɛ (Fig. 2a). We performed structural alignments for these domains using the DALI search39. As expected, DBL1X, DBL2X and DBL5ɛ adopt structures similar to other DBL domains from PfEMP1 and EBA-175 (refs. 31,36,37,40) (Extended Data Fig. 3a–c). We also observed a tandem packing of the dual DBL domains DBL3X/4ɛ and DBL5ɛ/6ɛ. These dual DBL domains exhibit a twisted pattern reminiscent of other tandem packed DBL pairs of EBA-140 and EBA-175, although the angle between DBL domains differs (Extended Data Fig. 3d–g)31,41. EBA-140 and EBA-175 both belong to the EBL family that mediates the recognition of sialic acid on erythrocyte glycoproteins32.

The arrangement of DBL2X–ID2 represents a conserved architecture within the PfEMP1 protein family. The structure of DBL2X–ID2 from VAR2CSA is similar to the DBL1α-CIDRγ domains of PfEMP1–VarO, although they adopt different DBL–ID/CIDR orientation (Extended Data Fig. 3h). VarO binds the ABO blood group trisaccharide that mediates rosetting of infected red blood cells40. The individual DBL domains (DBL2X and DBL1α) are structurally similar, and the VAR2CSA ID2b domain has a strong similarity to the VarO CIDRγ subdomain 2 despite low sequence similarity (Extended Data Fig. 3b,i,j). The DBL–ID/CIDR angle differs between VAR2CSA and PfEMP1–VarO, but this tandem arrangement suggests that the DBL–ID/CIDR pairing among other PfEMP1 family members may have a similar architecture. These structural delineations will better inform and define the diverse PfEMP1 domain architectures.

Multiple domains within the core domains create major and minor CSA-binding channels

Previous studies have shown that VAR2CSA tends to bind the sulfate-clustered domains of the chondroitin sulfate proteoglycans in the intervillous spaces of the placenta42, and a minimum of a CSA dodecasaccharide is required for efficient binding42,43,44,45. Indeed, the atomic resolution reconstructions provided assignment of a CSA polymer comprising 12 monomers bound in a positively charged channel that is formed by NTS, DBL1X, DBL2X and DBL4ɛ (Fig. 3a,b). We name this channel the major CSA-binding channel. Five sulfated N-acetylgalactosamine-4-sulfate (ASG) and six glucuronic acid (BDP) residues could be unambiguously assigned and built into the density. Furthermore, density for an additional residue is observed at the start of the chain that can accommodate a monosaccharide, but this density was not of sufficient quality to facilitate adequate modelling of this single residue (Fig. 3a).

Fig. 3: CSA-binding sites within the major binding channel.
figure 3

a, Two views of the structure showing a dodecamer of CSA is bound in the major binding channel. The cryo-EM map densities of NTS, DBL1X, DBL2X and DBL4ɛ are shown as solid with transparency. The cryo-EM density of CSA is shown as a mesh overlaid on the CSA model as a stick. The left and right monosaccharides are BDP-12 and BDP-2, indicated by the numbers 12 and 2, respectively. The density for the first monosaccharide of the chain is also observed and labelled 1. Binding sites 1 and 2 are highlighted by a purple oval and an orange rectangle, respectively. b, Electrostatic surface of VAR2CSA showing the positively charged binding channel of CSA. c,d, Detailed interactions between CSA and binding site 1. Each monosaccharide is numbered. The protein sequence number and side chains of the residues involved in CSA recognition are shown. e, Detailed interactions between BDP-2 to ASG-5 and binding site 2. Each monosaccharide is numbered. The protein sequence number and side chains of the residues involved in CSA recognition are shown. f, The CSA molecule in the major binding channel is positioned as in Fig. 2a with numbering of each of the monosaccharides. The domains that each monosaccharide interact are indicated below. g, Partial sequence alignment of the residues involved in binding CSA in the major binding channel; the residues in major binding sites 1 and 2 are highlighted on top by orange and purple spheres, respectively. The surface-exposed binding site on DBL2 is highlighted by the pink line.

The major binding channel can be separated into two non-continuous CSA-binding sites (Fig. 3a). The first binding site (major binding site 1) is located on the surface of DBL2X and binds CSA residues BDP-8 to ASG-11 (Fig. 3a,c,d). The sulfate group of ASG-11 forms hydrogen bonds with N557 whereas BDP-10 has interactions with R829, K561 and the main chain of A822 (Fig. 3c). ASG-9 forms multiple hydrogen bonds with K562, N576, K828 and Q832 (Fig. 3d). The interaction of CSA with major binding site 1 is further strengthened by the hydrogen bonds between BDP-8 and K828 (Fig. 3d).

The second binding site (major binding site 2) lies deep in the hole of the funnel-shaped channel and is surrounded by NTS, DBL1X, DBL2X and DBL4ɛ (Fig. 3a,b). Multiple hydrogen bonds are also formed in this region: ASG-5 with K835, E1880 and K1889; BDP-4 with K48; ASG-3 with K48 and R846; and BDP-2 with R846 and the main chain of I1785 (Fig. 3e). Y45 further stabilizes the interaction by packing tightly with BDP-2 (Fig. 3e). BDP-6 and ASG-7 do not exhibit direct interactions with VAR2CSA and may serve to link the two binding sites together (Fig. 3f).

Weak density (the size of a single ASG monosaccharide) was also identified in a separate region of VAR2CSA, which we have termed the minor binding channel that forms a potential second binding site (Extended Data Fig. 4a). The minor binding channel is made up of the residues from the C-terminus of DBL2X and the N-terminus of ID2a, two regions previously implicated in CSA binding16. Similar to the major binding channel, the minor channel is rich in positively charged residues (Extended Data Fig. 4b).

The CSA-binding residues in both channels are highly conserved among different VAR2CSA alleles (Fig. 3g and Extended Data Fig. 4c). In addition, although individual segments of VAR2CSA demonstrate CSA binding, the full-length protein binds CSA with far greater affinity than any segment alone14,19,46. The structure provides a clear rationale for these observations. DBL1X, DBL2X, ID2a, ID2b, DBL3X, DBL4ɛ and ID3 all interact extensively to create an interwoven architecture (Fig. 1b,d). The CSA binding is probably dependent on an intact core structure implicating multiple domains in high-affinity CSA binding.

VAR2CSA adopts preformed CSA-binding channels

In addition to the CSA–VAR2CSA complex, we also solved the structure of CSA-free VAR2CSA from the parasite strain FCR3 (Extended Data Figs. 1 and 5 and Supplementary Table 1). The sequence of these two VAR2CSA alleles shares 79% identity (Fig. 4a). The structure of VAR2CSA FCR3 may potentially inform development of a strain-transcending vaccine by revealing any conformational changes due to CSA binding, as well as commonalities and differences between strains. In addition, the current placental malaria candidate vaccines are based on sequences from VAR2CSA NF54 and VAR2CSA FCR3, meaning that an FCR3 structure would facilitate comparison with existing candidate vaccines25,26.

Fig. 4: VAR2CSA adopts preformed CSA-binding channels.
figure 4

a, Domain boundaries of VAR2CSA NF54 and VAR2CSA FCR3 ectodomains. The protein sequence identity between the two is labelled. b, Two views of the cryo-EM density for the 3.38 Å core region of VAR2CSA FCR3. c, Structural alignment of Apo VAR2CSA FCR3, crosslinked VAR2CSA FCR3 and the CSA–VAR2CSA NF54 complex. VAR2CSA NF54 is in pink, CSA in yellow, VAR2CSA FCR3 in green and VAR2CSA FCR3 crosslinked in blue. d, The electrostatic surface of VAR2CSA FCR3 is shown on the left with a zoom-in view of the CSA-binding sites on the right. The major and minor binding channels are indicated by arrows.

We determined the cryo-EM structure of the CSA-free full-length VAR2CSA FCR3 to a resolution of 4 Å after collecting 10,108 videos (Extended Data Fig. 5a–c). The reconstructed map of Apo VAR2CSA FCR3 exhibits a similar shape to the CSA-bound VAR2CSA, and also resembles the number 7 with a stable core and flexible arm (Supplementary Video 2). Local refinement of the arm resulted in a 4.7 Å map (Extended Data Fig. 5d). To further improve the resolution and the accuracy of our atomic model, we crosslinked the full-length ectodomain under mild conditions and collected a second dataset of 4,739 micrograph videos. This dataset resulted in a reconstruction of the stable core comprising DBL1X to ID3 to 3.4 Å resolution, enabling accurate model building for this segment which comprises the core of VAR2CSA (Fig. 4b; Extended Data Figs. 5e–g and 6, and Supplementary Table 1). The 4.7 Å reconstruction from masked local refinement of DBL5ɛ and 6ɛ allowed docking and refinement of the C-α positions of DBL5ɛ as well as the available crystal structure of DBL6ɛ (Protein Data Bank (PDB), accession no. 2Y8D) into this map (Extended Data Fig. 6g)47. Our final model for VAR2CSA FCR3 spans residues 23–2,602 of VAR2CSA with a few flexible loops and ID1 omitted, because these segments were not ordered in the reconstruction (Extended Data Fig. 6d). Comparison of the DBL1X-ID3 map generated from the crosslinked and non-crosslinked sample reveals no noticeable conformational changes in the core, indicating that the crosslink did not affect conformation (Fig. 4c).

No major conformational changes were observed between the structures of CSA-bound and CSA-free VAR2CSA (Fig. 4c). The structural similarity between VAR2CSA FCR3 and NF54 also suggests different VAR2CSA variants are likely to have similar overall architecture (Fig. 4c). CSA could be well docked in the corresponding major and minor binding channels on VAR2CSA FCR3, which is similarly positively charged (Fig. 4c,d). This suggests that the CSA-binding mode we identified is conserved between strains, and that VAR2CSA does not require major conformational changes to enable CSA binding. However, some flexibility is observed in the region DBL1X–DBL2X, suggesting that limited flexing of the molecule may facilitate CSA binding (Supplementary Video 2).

Analysis of VAR2CSA variability and placental malaria candidate vaccine designs

High-sequence polymorphism among diverse VAR2CSA variants is one of the major barriers to strain-transcending vaccine development38,48. We analysed the conservation of 14 VAR2CSA sequences and mapped this on to the structure49 (Fig. 5a). Residues in the CSA-binding sites within the major and minor binding channels are conserved, but the flanking regions are not (Fig. 5b). The high conservation of the residues within both channels that directly bind CSA indicate that these residues are under selective pressure to be maintained across strains. These results suggest that all strains retain these residues to ensure CSA binding. The variability observed in the flanking regions that are distant from the CSA-binding residues and do not directly contact CSA suggests that variability at these positions should not impact CSA binding, but may play a role in immune evasion.

Fig. 5: Variability analysis of VAR2CSA.
figure 5

a, Fourteen sequences of VAR2CSA that represent the diversity were analysed using ConSurf49. Surface residues on a space-filled model are shaded according to degree of conservation. The colour key is shown below. Four different views are illustrated. b, Left: the atomic model of CSA–VAR2CSA NF54 complex. Right: space-filling models of the CSA–VAR2CSA-binding interface. Surface residues are shaded according to degree of conservation. The colour key is shown below. The surface-exposed major binding site 1 is highlighted by a black dotted circle. c, Left: the structural model of sequences comprising PRIMVAC are shown as a ribbon. The remainder of the VAR2CSA protein is shown as a surface. Right: based on the variability analysis in a, PRIMVAC is shown as bold whereas the rest of the VAR2CSA molecule is shown as transparent. d, Left: the structural model of sequences comprising PAMVAC is shown as a ribbon. The remainder of the VAR2CSA protein is shown as a surface. Right: based on the variability analysis in a, PAMVAC is shown as bold whereas the rest of the VAR2CSA molecule is shown as transparent.

The CSA-binding site 2 is buried deeply in the major binding channel and may not be accessible to antibodies (Fig. 5b). Although binding site 1 is exposed on the VAR2CSA surface, the DBL2X surface surrounding the conserved CSA-binding residues is highly heterogeneous among diverse VAR2CSA strains (Fig. 5b). Moreover, there is also extensive polymorphism surrounding the conserved residues within the minor CSA-binding channel (Fig. 5b). This heterogeneity probably reflects variation induced under host immune pressure. Other than the key CSA-binding residues, a large number of the surface residues are polymorphic among different VAR2CSA strains (Fig. 5a).

The interwoven domain architecture identified in the structure is consistent with the finding that multiple domains play a role in binding CSA, as multiple domains create the binding channels. PAMVAC and PRIMVAC both include DBL2X, and this domain will help to generate CSA-blocking antibodies because it is the major domain contributing residues to binding site 1. However, sequence variability surrounding binding site 1 (Fig. 5b) could potentially limit the induction of strain-transcending antibodies. Indeed, both candidate vaccines demonstrated low heterologous inhibitory activity25,26. PAMVAC and PRIMVAC contain only a portion of the major binding channel and this may explain the limited protection data (Fig. 5c,d). The structure of full-length VAR2CSA reveals larger CSA-binding sites with conserved targets for strain-transcending antibodies. This information will guide improvements on existing candidate vaccines and facilitate structure-based design of a strain-transcending placental malaria vaccine.

Epitopes mapped on VAR2CSA

The structure of full-length VAR2CSA provided a template to investigate previously discovered antibody epitopes. We mapped known epitopes on the structure (Fig. 6). Four multigravidae sera with inhibitory activity showed enhanced binding to distinct linear peptides using overlapping peptide scanning of DBL4ɛ50. All the sera showed antibody binding to peptides P23–P25 and one sample also showed reactivity to peptides P45 and P57. Interestingly, mapping of these peptides on the 3D structure revealed that they all cluster together and are located at the entrance to the deeply buried binding site 2 of the major CSA-binding channel (Fig. 6). Separately, naturally acquired antibodies to ID1–DBL2–ID2a and DBL4ɛ recombinant constructs were found to have inhibitory activity against both homologous and heterologous isolates51, and these results are consistent with the structural analysis identifying these domains as important for CSA binding. We mapped other known epitopes of antibodies from multigravid women (Fig. 6). The epitopes of PAM8.1, which is an antibody derived from multigravid woman, was mapped to a strain-specific loop region on DBL3X52. However, this loop is not visible in the structure (Fig. 6). Peptide P62 found within DBL3X and peptide P63 within DBL5ɛ are two peptides that react strongly with Tanzanian female plasma53. Last, peptides P20 and P23 are two cryptic epitopes on DBL5ɛ that are shown to cross-react with the antibodies derived from P. vivax DBP54. However, whether these peptide epitopes are neutralizing epitopes, and the inhibitory mechanisms of these antibodies require further study.

Fig. 6: Human antibody epitopes mapped on VAR2CSA.
figure 6

VAR2CSA structure is shown as a surface. The characterized peptide epitopes are coloured as illustrated: P57 (red), P54 (yellow) and P23–P25 (blue) are epitopes on DBL4ɛ. PAM8.1 epitope is a flexible loop on DBL3X that is missing the final structure and is coloured pink and illustrated by a dashed line. P62 on DBL3X is shown as brown and P63 on DBL5ɛ as light green. The cryptic epitopes P20 and P23 on DBL5ɛ are shown as dark green and orange, respectively.

Discussion

The ability to sequester in different organs, combined with sophisticated antigenic diversity, has made P. falciparum the deadliest malaria species to infect humans8. Malaria during pregnancy is a major problem in sub-Saharan Africa, affecting an estimated 150 million pregnant women annually1. Women can become susceptible to malaria infection during pregnancy despite the immunity that might have developed from previous P. falciparum infections. Pregnant women may also serve as a reservoir for parasites, which poses challenges to malaria eradication1. As parasites continue to develop drug resistance and new drugs entail potential teratogenesis, an effective vaccine to prevent placental malaria is urgently needed1,55.

The cryo-EM structure of VAR2CSA in CSA-bound and CSA-free states determined in the present study support a model of binding depicted in Extended Data Fig. 7. We identified a major CSA-binding channel that has two non-continuous, CSA-binding sites, and a potential minor CSA-binding channel on VAR2CSA, which are preformed by multiple domains (Supplementary Video 3). Although most of the CSA-binding residues are highly conserved among various VAR2CSA alleles, a few residues at the openings of the binding sites exhibit slight polymorphism (Fig. 3g). In addition, the conserved residues are flanked by highly polymorphic residues (Fig. 5b). These variabilities may contribute to diverse binding affinity and the disease severity of various VAR2CSA isolates56. The surface-exposed binding site 1 of the major binding channel is formed solely by DBL2X (Fig. 3a). The buried binding site 2 of the major binding channel and the minor binding channel are formed by the NTS, DBL1X, DBL2X, ID2a and DBL4ɛ domains (Fig. 3a). The finding that DBL2X appears in all CSA-binding sites suggests its central role in CSA binding. This is consistent with previous studies that identified DBL2X as central to the minimal CSA-binding region suggested for VAR2CSA10,16, and the fact that DBL2X is included in both of the two candidate vaccines currently in clinical trials for placental malaria25,26 (Fig. 1a). However, the multidomain-binding model identifies all the CSA-binding regions and explains why the full-length VAR2CSA has much stronger CSA-binding affinity than any individual or short continuous domains14 (Extended Data Fig. 7a). The present study also identified DBL4ɛ as a key component of the CSA-binding channel. The binding residues of DBL4ɛ are buried in the hole of the channel and they work together with segments from the NTS, DBL1X and DBL2X to form binding site 2 of the major binding channel.

The similar overall architecture of VAR2CSA from parasite strains NF54 and FCR3 implies that it adopts a conserved shape. Some VAR2CSA proteins have been shown to have an additional DBL domain termed ‘DBL7ɛ’57. This DBL domain would be connected to the C-terminus of DBL6ɛ that is fully solvent exposed and away from both the arm and the core of all structures reported in the present study. DBL7ɛ can readily be accommodated as an extension of the arm and is unlikely to alter the remaining architecture of VAR2CSA.

One caveat of the present study is that we used CSA from bovine trachea, which consists of a mixture of CSA with different sulfation patterns and different lengths. Although five of the CSA disaccharides are fully sulfated in the structure, we cannot determine the sulfation status of the first CSA monosaccharide. As the CSA completely traverses through the binding channel of VAR2CSA, it is also plausible that VAR2CSA may slide along a CSA chain to search for a highly sulfated cluster before strong binding. Furthermore, the fact that CSA is tethered to the proteoglycan in the placenta might facilitate the binding of multiple CSA glycans to the different CSA-binding channels on one VAR2CSA molecule or distinct VAR2CSA molecules that are located on the same knob58,59. A second caveat is that, although we observe density at two locations described as the major and minor binding channels, further studies are required to establish the relative importance of each channel in binding. A recent study suggests that phosphorylation of VAR2CSA at residues S429, S433 and T934 is associated with enhanced adhesive properties60. S429 and S433 are located on ID1, which is flexible and disordered in the final reconstruction of the present study, and T934 does not directly mediate the CSA binding. Therefore, how these three phosphoresidues impact the adhesion requires further investigation. Finally, although the resolution of most of the core is at the atomic level, the resolution for the arm is lower and should be interpreted accordingly.

The high variability of VAR2CSA from distinct P. falciparum strains poses a challenge to the development of strain-transcending vaccines for placental malaria. Mapping the VAR2CSA sequence variability on to the 3D structure of VAR2CSA shows that the CSA-binding site 1 on DBL2X is highly conserved but is surrounded by highly polymorphic residues (Fig. 5b). This explains the low heterogeneous inhibitory activity observed for the placental malaria candidate vaccines PRIMVAC and PAMVAC25,26. The highly polymorphic segments probably impact on antibodies that bind on or close to the CSA-binding sites, preventing the development of antibodies capable of binding to the VAR2CSA variants. The structure of VAR2CSA bound to CSA presented in the present study serves as a template to design and develop vaccines against placental malaria that will overcome strain-specific responses by focusing the immune response to conserved regions.

Multiple pieces of evidence suggest that the immunogens encompassing the region NTS-DBL2X can bind to antibodies from multigravid women living in pandemic regions, and can induce protective antibodies in clinical trials25,26,51,61. Intriguingly, the previously identified linear peptide epitopes on DBL4ɛ reside right next to the major CSA-binding channel. It is possible that these linear peptides may be part of larger conformational epitopes that target the major CSA-binding channel.

The limited structural information of full-length PfEMP1 proteins has hampered progress towards understanding PfEMP1 host–parasite interactions and vaccine development. Low-resolution structures of two other PfEMP1 proteins solved by cryo-electron tomography and small-angle X-ray scattering suggest that they adopt shapes that mimic either a crescent or a boomerang62,63,64. However, most PfEMP1 proteins utilize the N-terminal DBL and CIDR domains that correspond to the regions surrounding DBL2X of VAR2CSA to bind diverse host receptors65. The atomic resolution structural information of how these segments in VAR2CSA bind CSA serves as a framework to understand PfEMP1 binding to diverse receptors. Together, these results suggest that different PfEMP1 proteins may adopt various 3D structures, but they may utilize a conserved N-terminal structure for receptor binding.

It is interesting that the form of CSA bound by VAR2CSA is exclusively expressed in the placenta in healthy individuals, but is expressed and presented in cells from diverse cancers of epithelial and mesenchymal origin27. This expression allows for the specific targeting of cancer cells by delivering therapeutics that utilize VAR2CSA as a carrier, and for VAR2CSA-based cancer diagnostics28,29. Clear structural definition of the functional segments from VAR2CSA required to bind CSA will lead to improvements for placental malaria vaccine development, as well as cancer therapeutics and diagnostics (Extended Data Fig. 7b). The rVAR2, which is composed of the DBL1X to ID2a domains of VAR2CSA, has been shown to specifically recognize cancer cells and can be conjugated with drugs to inhibit tumour growth27. It comprises a similar region to the sequences used in PAMVAC and PRIMVAC (Fig. 1a). However, rVAR2 lacks the critical elements for full CSA binding provided by the NTS and DBL4ɛ which form the complete CSA-binding channel (Extended Data Fig. 8). Improving the affinity of VAR2CSA fragments for cancer therapy by structure-guided design may allow for improved treatments that require lower doses for efficacy.

In summary, the present study of VAR2CSA rationalizes available antibody-binding and receptor-binding observations and defines the CSA-binding elements that comprise conserved segments of VAR2CSA to target strain-transcending protective immunity. This information will support precise design of vaccines to provide much-needed medical countermeasures against placental malaria and will inform the development of potent targeted cancer therapeutics and diagnostics.

Methods

Expression of VAR2CSA NF54 and FCR3 in Expi293 cells

The wild-type VAR2CSA NF54 and VAR2CSA FCR3 were expressed in Expi293 (Thermo Fisher Scientific) cells according to the manufacturer’s protocols. In brief, the cells were grown shaking at 37 °C and in 8% CO2, maintaining cultures at continuous log(phase growth) (3.0–5.0 × 106) for three to four passages after thawing. The day before transfection, 500 ml of culture was seeded at a density of 2.5–3.0 × 106 cells ml−1 in a 2-litre flask. The day of transfection, cells were diluted back to 2.5–3 × 106 before transfection. The plasmid DNA was diluted with 25 ml of Opti-MEM I medium (Thermo Fisher Scientific) to a final concentration of 1 µg ml−1.

Then, 1.4 ml of ExpiFectamine 293 Reagent (Thermo Fisher Scientific) was diluted with 25 ml Opti-MEM I medium, gently mixed and incubated at room temperature for 5 min. The diluted ExpiFectamine 293 Reagent was then added to the diluted plasmid DNA, mixed by swirling and incubated at room temperature for 20 min. The mixture was added to the cells slowly while swirling the flask. The flask was returned to the incubator at 37 °C and in 8% CO2. After 20 h of incubation, ExpiFectamine 293 Transfection Enhancer 1 (Thermo Fisher Scientific) and ExpiFectamine 293 Transfection Enhancer 2 (Thermo Fisher Scientific) were added to the transfection flask.

Purification of VAR2CSA NF54 and FCR3

The cultures were centrifuged at 5,000 revolutions per min for 15 min 5 d post-transfection. The supernatant was collected and loaded on to Ni Sepharose Excel columns (GE Healthcare), which were manually packed in a glass gravity column. The column was washed twice with 10 column volumes of wash buffer (25 mM 4-(2-hydroxyethyl)-1-piperazine-ethanesulfonic acid (Hepes), pH 7.4, 150 mM NaCl, 25 mM imidazole) and eluted with 5 column volumes of elution buffer (25 mM Tris-HCl, pH 7.4, 150 mM NaCl, 250 mM imidazole). The elutes were concentrated with a 100-kDa cutoff centrifugal filter unit (Millipore Sigma) to 1 ml and further purified by size-exclusion chromatography (Superose 6 Increase 10/300, GE Healthcare) in buffer A (10 mM Hepes, pH 7.4, 100 mM NaCl). The peak fractions were collected and verified by sodium dodecylsulfate–polyacrylamide gel electrophoresis before EM grid preparation.

On-column cross-linking of VAR2CSA FCR3

To mildly stabilize the protein, on-column cross-linking of VAR2CSA FCR3 was performed as described66. First, a bolus of glutaraldehyde (200 µl 0.25% v:v) was injected into a pre-equilibrated Superose 6 Increase 10/300 column in buffer A and run at 0.25 ml min−1 for 16 min (a total of 4 ml buffer). Then, the column flow was paused and the injection loop was flushed using buffer, followed by injection of purified VAR2CSA FCR3 (200-µl volume, at 3 µM concentration). Subsequently, the column was run at 0.25 ml min−1 and 0.3-ml fractions were collected for EM grid preparation.

VAR2CSA–CSA complex reconstruction

The CSA sodium salt from bovine trachea (Sigma) was dissolved in buffer A to 10 mg ml−1. Then VAR2CSA was mixed with CSA at a molar ratio of 1:4. The mixture was incubated on ice for 30 min before EM grid preparation.

Cryo-EM grid preparation and data collection

The homogeneity of samples was first assessed by negative-stain EM with 0.7% (w:v) uranyl formate or 1% uranyl acetate as described67. Before preparing grids for cryo-EM, the freshly purified protein sample was centrifuged at 13,000g for 2 min to remove potential protein aggregates, and the protein concentration was measured with a NanoDrop spectrophotometer (Thermo Fisher Scientific). The final protein concentration used for cryo-EM grid preparation is 0.8 mg ml−1.

The protein sample was kept on ice before grid preparation. A 3.5-µl aliquot of protein was applied to a glow-discharged Quantifoil 300 mesh 1.2/1.3 carbon grid that had been glow discharged for 90 s at 10 mA with PELCO easiGlow Glow Discharge Set. VAR2CSA FCR3 and the VAR2CSA FCR3 crosslinked samples were blotted for 3 s and VAR2CSA NF54 CSA complex was blotted for 2 s with a blot force of 3 using 55/20 mm filter paper (TED PELLA) before being plunged into liquid ethane with a Vitrobot Mark VI (FEI) set at 16 °C and 100% humidity. After screening multiple grids, three grids made with the samples VAR2CSA NF54 in complex with CSA, VAR2CSA FCR3 alone and the crosslinked VAR2CSA FCR3 were chosen for data collection based on the evaluation of data quality.

The NF54 + CSA and FCR3 dataset were collected on the 300-keV Titan Krios with Gatan BioQuantum Image Filter in the National Institutes of Health (NIH) National Cancer Institute (NCI)/NIH IRP Cryo-EM Facility (NICE) facility. The images were recorded with a 20-eV slit post-GIF K2 Summit camera in super-resolution counting mode at a nominal magnification of 130,000× and a defocus range from −0.7 to −2.0 µm. Exposures of 8 s were dose fractionated into 40 frames (200 ms per frame), with an exposure rate of 8 electrons pixel−1 s−1, resulting in a total exposure of 57 electrons Å−2. The data collection was automated using the SerialEM software package68.

The FCR3 crosslink dataset was collected on Titan Krios electron microscopes in the NIH Multi-Institute Cryo-EM Facility (MICEF). The images were recorded with a K2 Summit camera equipped with a Gatan Quantum LS imaging energy filter, with the slit width set at 20 eV in counting mode at a nominal magnification of 130,000× and a defocus range from −1.0 to −2.0 µm. Exposures of 10 s were dose fractionated into 50 frames (200 ms per frame), with an exposure rate of 71.2 electrons Å−2, The data collection was automated using the Leginon software package69.

Image processing

We collected 6,196 dose-fractionated videos of VAR2CSA NF54 + CSA. The processing was done within cryoSPARC (v.2.14.2)70. Motion correction was done by cryoSPARC’s Patch motion correction with an output F-crop factor of one-half. CTF estimation for each micrograph was calculated with Patch CTF estimation. Particles were autopicked from each micrograph with the blob picker from cryoSPARC and then sorted by two-dimensional (2D) classification for two rounds to exclude bad particles; 858,299 particles were selected. The particles were used to generate a map from scratch in cryoSPARC. Particles were classified into five classes using the low-pass-filtered (30 Å) map from scratch as a template. Classes 1 and 4 with a total of 299,571 particles, which has a clear map of the core region, were selected to conduct NU-refinement and generated a 3.5 Å map. A mask covering core regions was then used to perform local refinement and generated a 3.36 Å map. The map of the core is locally filtered with a b-factor of −76.4 Å2 in Fig. 1c. Class 1, which has a clear density of the whole protein, was selected solely with 157,702 particles to perform NU-refinement and generated a 3.87 Å map of the full-length complex. A mask covering DBL5ɛ and 6ɛ regions was then used to perform local refinement and generated a 4.88 Å map.

In addition, 100,108 dose-fractionated videos of VAR2CSA FCR3 were collected on a 300-kV Titan Krios (FEI) equipped with a K2 Summit direct electron detector (Gatan). Similarly, the processing was done within cryoSPARC (v.2.14.2)70. Motion correction was done by cryoSPARC’s Patch motion correction with an output F-crop factor of one-half. CTF estimation for each micrograph was calculated with Gctf (v.1.06: https://www2.mrc-lmb.cam.ac.uk/research/locally-developed-software/zhang-software/)71. Particles were autopicked from each micrograph with the blob picker from cryoSPARC and then sorted by 2D classification for two rounds to exclude bad particles; 783,088 particles were selected, so the dataset contained 783,088 particles. The particles were used to generate a map from scratch in cryoSPARC. Particles were classified into 10 classes using the low-pass-filtered (30 Å) map from scratch as a template. Class 4, with a total of 271,442 particles, was selected to conduct NU-refinement and generated a 4 Å map. A mask covering DBL5ɛ and DBL6ɛ domains was then used to perform local refinement and generated a 4.69 Å map

We also collected 4,739 dose-fractionated videos of VAR2CSA FCR3. The processing was also done within cryoSPARC70. Full-frame motion correction was done by cryoSPARC’s own implementation. CTF estimation for each micrograph was calculated with Gctf. Then, 2,010,465 articles were autopicked from each micrograph with the blob picker from cryoSPARC and sorted by 2D classification for two rounds to exclude bad particles; 505,409 particles were selected. The particles were used to generate a map from scratch in cryoSPARC. Particles were classified into three classes using the low-pass-filtered (30 Å) map from scratch as a template. Class 1, with a total of 319,520 particles, was selected to conduct NU-refinement and generated a 3.52 Å map. A mask covering the core regions was then used to perform local refinement and generated a 3.38Å map.

Model building and refinement

We first built the model for the core of VAR2CSA FCR3 crosslinked. The crystal structures of DBL3X+4ɛ (Protein Data Bank (PDB) accession no. 4P1T)46 was used as a starting model and was fitted and refined into the cryo-EM density map with PHENIX (v.1.18.2) ‘Dock in map’ and ‘Real-space refinement’72. The successful docking and the clear fitting of the DBL3X+4 side chains with the density indicated that the fitting was correct. The clear density of a α-helix (ID3) that connects the C-terminus of DBL4ɛ with the flexible arm that has a density of two tandem DBL domains helped us confirm that the core is made up of DBL1X to DBL4ɛ, whereas the arm consists of DBL5ɛ and 6ɛ. The structures of DBL1X, DBL2X and ID2b were predicted from Phyre2 (http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index)73 and then fitted and refined into the map by PHENIX (v.1.18.2). The missing regions were manually built in COOT (v.0.9)74. The atomic model for the core was refined using phenix.real_space_refine global minimization (default), morphing and simulated annealing rama potential75.

The model of the VAR2CSA crosslink core was used to build the VAR2CSA FCR3 structure by docking the model into the VAR2CSA FCR3 map and auto-refined by PHENIX. To build the arm region of VAR2CSA, we used the crystal structure of DBL6ɛ (PDB accession no. 2Y8D)47 and a predicted DBL5ɛ structure with Phyre2. The structures were fit in the cryo-EM density map from local refinement with Chimera using the ‘fit in map’ tool76. The atomic model for the arm was refined using phenix.real_space_refine global minimization (default), morphing and simulated annealing rama potential.

The model of the core and arm regions of CSA–VAR2CSA NF54 complex was built separately by fitting the corresponding model of VAR2CSA FCR3 into the map and manually mutating the residue and fragment adjustment. The CSA model was built with the C4S tetrasaccharide from the structure of the Shh–chondroitin-4-sulfate (C4S) complex (PDB accession no. 4C4M)77. The atomic models were refined using phenix.real_space_refine global minimization (default), morphing and simulated annealing rama potential. Then we combined the models of the core and arm by fitting both the maps together.

Structural and map figures were prepared in Chimera (v.1.13.1, https://www.cgl.ucsf.edu/chimera/)76 and ChimeraX (v.1.0, https://www.rbvi.ucsf.edu/chimerax/)78, which are developed by UCSF, and PyMOL (v.2.1, https://pymol.org/2/).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.