INTRODUCTION

The SOX protein family is made of transcription factors harboring a high-mobility-group (HMG) domain at least 50% similar to that of SRY (encoded by the sex-determining region on the Y chromosome).1 This domain mediates DNA binding and bending, nuclear trafficking, and protein–protein interactions. The 20 SOX proteins existing in humans and other mammals fall into eight groups (SOXA to SOXH) based on sequence identity within and outside this domain.2,3 Most have been shown in animal models to play pivotal roles in determining the lineage choice, differentiation program, and survival capacity of discrete cell types, such that as a whole the SOX family controls many crucial biological processes, including sex determination, neurogenesis, and skeletogenesis.1 In humans, pathogenic variants in half of the SOX genes were shown to date to cause developmental disorders.4 For example, SRY variants cause XY sex reversal (MIM 400045 and 400046);5SOX9 variants cause campomelic dysplasia with or without XY sex reversal (MIM 114290);5SOX18 variants cause hypotrichosis–lymphedema–telangiectasia syndrome (MIM 607823 and 137940);6 and SOX4 andSOX11 (MIM 615866) variants cause Coffin–Siris syndrome–like syndromes.7,8 Most pathogenic variants are de novo and, except for SRY, result in dominant disorders because of gene haploinsufficiency.

Lamb–Shaffer syndrome (LAMSHF, MIM 616803) was initially described as a condition caused by de novo deletions ranging from a few kilobases to several megabases and including at least part of SOX5.9 LAMSHF is clinically characterized by developmental delays, language and motor deficits, intellectual disability, behavioral disturbances including autistic traits, and other, partially penetrant features.9,10,11,12SOX5 is located on chromosome 12p12.1 and gives rise to at least five transcript isoforms through expression from different promoters, alternative start site usage, and alternative precursor messenger RNA (pre-mRNA) splicing. The longest isoform (NM_006940) encodes a 763–amino acid protein (originally referred to as L-SOX5, but more recently and henceforward called SOX5) and is the predominant brain isoform.13 The shortest isoform (NM_178010) encodes a protein corresponding to the L-SOX5 C-terminal half and is testis-specific. All long protein isoforms contain the same functional domains and are collectively critical in mouse development.14Sox5-/- mice are born with lethal skeletal malformations and with defective deep-layer cortical projection neurons, while Sox5+/- mice have a normal lifespan and no obvious abnormalities.15,16,17,18

To date, only a few SOX5 point variants, mostly introducing premature termination codons, have been reported in LAMSHF patients19,20,21 or in large genetic studies of developmental disorders without detailed clinical descriptions.22,23,24,25 In this study, we describe 41 unpublished patients carrying various SOX5 deletions and point variants, including 16 with missense variants. We delineate more precisely the clinical spectrum associated with SOX5 alterations, aim at establishing genotype–phenotype correlations, and explore pathogenicity of selected variants using both in silico and functional approaches.

MATERIALS AND METHODS

Human subjects

We collected clinical and molecular data from patients withSOX5 microdeletions or point variants through GeneMatcher,26 DECIPHER27 (patient IDs 333039, 340665, 271393, 264625), and clinical networks. Referring physicians used standard developmental scales and filled out a table with detailed developmental, neurological, and behavioral history, including imaging and electroencephalogram (EEG) data where available. The study was approved by INSERM (RBM C12-06). We obtained informed written consent for all genetic studies as well as for the use of photographs shown in Fig. 2g.

Genetic studies

Diagnostic laboratories performed genetic tests on blood samples using microarrays or next-generation sequencing (Supplementary Table 1). SOX5 variants and deletions were validated and searched for in parents using Sanger sequencing and fluorescence in situ hybridization (FISH) or real-time polymerase chain reaction (PCR), respectively. SOX5 variants were described based on the longest isoform (NM_006940.5) using Alamut 2.11 (Interactive Biosoftware, France) and Human Genome Variation Society guidelines (www.hgvs.org/mutnomen). The InterVar interface was used to classify SOX5 variants with adjusted criteria according to American College of Medical Genetics and Genomics (ACMG) recommendations.28,29 Combined annotation dependent depletion (CADD) scores30 were calculated for each variant (Supplementary Table 1). SOX5 isoforms and promoters and other SOX sequences were retrieved from the National Center for Biotechnology Information (NCBI) and Fantom5 databases and sequences were aligned using ClustalW (MacVector16 software). The effects of missense variants on protein structure and function were predicted using HOPE31 and Swiss-Model.32SOX5 variants were queried in human populations using gnomAD. Data were statistically analyzed using Fisher’s exact and Wilcoxon–Mann–Whitney tests.

SOX5 plasmids

Expression plasmids for the longest SOX5 isoform and variants thereof were generated in the pKTol2C-EGFP plasmid.33 The EGFP sequence was replaced with custom-synthesized or PCR-amplified SOX5 sequences (primers are available upon request). Plasmid integrity was verified using Sanger sequencing.

SOX5 immunolocalization

HEK-293 cells (ATCC® CRL-1573) were plated on glass coverslips and transfected with pKTol2C-SOX5 plasmids (2 μg) and Lipofectamine 2000 Transfection Reagent (Thermo Fisher Scientific). Two days later, they were stained using Image-ITTM LIVE Plasma Membrane and Nuclear Labeling Kit (Thermo Fisher, I34406), fixed in 4% paraformaldehyde, permeabilized with 0.1% Triton X-100 in PBS (PBST), and blocked in PBST supplemented with 1% BSA and 22.5 mg/ml glycine. They were then incubated with rabbit polyclonal SOX5 antibody (1:200, Abcam, ab94396), followed by goat anti-rabbit antibody (1:500, Alexa Fluor 488, Invitrogen, A27034). After placing DAPI-containing Vectashield antifade mounting medium (Vector Laboratories), cells were imaged by confocal laser scanning microscopy (Zeiss LSM 780; 100× objective).

Western blot, electrophoretic mobility shift, and dimerization assays

HEK-293 cells were plated in six-well dishes and transfected eight hours later with empty or SOX5 expression plasmid (1 µg) and FuGENE6 (3 μl, Promega). The next day, extracts were prepared using NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermo Fisher Scientific) and tested by western blotting using SOX5 antibody (1:1000) and horseradish peroxidase–conjugated goat anti-rabbit IgG (1:5000, Vector Biolabs). Signals were visualized using ECL Prime Western Blotting Detection Reagents (Amersham). Electrophoretic mobility shift assay (EMSA) was conducted using the same extracts, 10 fmoles [α-32P]-dCTP-labeled 2HMG probe and 1 μg poly(dG-dC).poly(dG-dC), as described.34 Homodimerization was tested in western blots following cell extract incubation for 10 minutes with 0.01% glutaraldehyde.

Reporter assay

HEK-293 cells were transfected with FuGENE6 containing 150 ng pSV2βGal, 500 ng Acan [4xA1]-p89Luc reporter, 50 ng SOX9 expression plasmid, and 300 ng plasmid encoding no protein, wild-type (WT) SOX5, and/or variant SOX5, as previously described.35 Forty hours later, cells were collected in Tropix Lysis buffer (Applied Biosystems) with protease inhibitor cocktail (Thermo Fisher Scientific) and tested using Dual-Light luciferase and E. coli β-galactosidase assays (Thermo Fisher Scientific). Reporter activities were calculated as means with standard deviation of luciferase values measured for triplicates and normalized for transfection efficiency using β-galactosidase values.

RESULTS

The SOX5 variant spectrum associated with LAMSHF includes missense variants

We collected genetic and clinical information from 41 patients (Table 1, supplementary table 1). Eight patients (D1–D8), representing seven families, carried novel pathogenic microdeletions. These microdeletions ranged from 43.7 kb to 1.7 Mb and involved different breakpoints (Fig. 1a). While the largest deletion encompassed the entire SOX5 gene and its 5’ neighbor (BCAT1), the others were restricted to various segments of SOX5.

Table 1 Summary of genetic and clinical data
Fig. 1
figure 1

SOX5variant spectrum associated with Lamb–Shaffer syndrome (LAMSHF). (a) Location of genetic alterations identified in patients in this study. SOX5 transcript isoforms are labeled with National Center for Biotechnology Information (NCBI) accession numbers. Boxes 1 to 15, coding exons of isoform NM_006940. 5’ and 3’UTR: 5’ and 3’ untranslated sequences. p1 to p11 representSOX5 promoters listed in the Fantom5 database; p1 and p2 (in bold) are the main promoters driving SOX5 expression in brain. CC, coiled-coil domain. Double-arrowed lines, deletions in patients D1–D8. Point variants, labeled as indicated. (b) Location of point variants reported here (above) and previously (below) on the longest SOX5 isoform. Protein and domain residue boundaries are indicated underneath the schematic. Red, nonsense and frameshift variants. Blue and green, missense variants within and outside the HMG domain, respectively. Superscripts, references.

The other 33 patients belonged to 31 families and totaled 23 distinct point variants. Nineteen of these variants were classified by the InterVar interface as pathogenic or likely pathogenic (P1–P29) and the other four as variants of unknown significance (VUS) (V1–V4). Two patients had indels introducing frameshifts (P7 and P10) (Table 1, Fig. 1a, b). Two (P13 and P14) had variants altering the acceptor and donor splice sites of the coding exon 12, respectively. Thirteen patients (including a pair of dizygotic twins) totaled eight distinct nonsense variants (P1–P6, P8–P10, P11, P12, P15, P25, and P28). Truncating variants (i.e., nonsense, splice site, and frameshift variants) were scattered over the L-SOX5 isoform from the N-terminus to the middle of the HMG domain. All truncating SOX5 variants thus encode proteins lacking DNA-binding ability. Furthermore, since all variants spare the last exon, they likely trigger nonsense-mediated mRNA decay and thus prevent protein expression. Sixteen patients (including a sib pair) had 11 different missense variants (P16–P24, P26, P27, P29, and V1–V4). Seven of these variants were clustered in the HMG domain, while the four VUS occurred in the first coiled coil (V1), between the coiled coils (V2), or after the HMG domain (V3 and V4).

Five identical nucleotide transitions were identified in several unrelated individuals: c.622C>T, p.Gln208* (P2 and P3); c.637C>T, p.Arg213* (P4/P5 and P6); c.1477C>T, p.Arg493* (P11 and P12); c.1678A>G, p.Met560Val (P16 and P17); and c.1711C>T, p.Arg571Trp (P19 to P23). Besides a few that were of unknown inheritance, these alterations were all de novo and thus suggested the presence of hot spots for nucleotide transitions. Of additional note, 17 of 22 single-nucleotide variants identified in patients are C>T and G>A transitions, suggesting that many SOX5 point variants result from cytosine deamination, a prevailing mechanism of genetic alteration.36

High rate of parental mosaicism

Most microdeletions and variants predicted to be pathogenic or likely pathogenic were undetected in parental blood samples, suggesting de novo occurrence (25/34 families, 74%). However, in each of three families, the same alteration was found in two affected siblings (D6 and D7; P4 and P5; and P22 and P23), but not in their parents, and in two other patients (D3 and P9), the variant was present at low levels in maternal blood. In addition, one nonsense variant was transmitted to a patient (P1) from his affected mother, where it was de novo. Variant transmission could not be determined for four patients (D1, P7, P12, P19) due to unavailability of parental samples. These findings thus indicate that pathogenic LAMSHF variants are frequently inherited from a mosaic parent (5/34, 15%) and also occasionally from an affected parent (1/34, 3%).

Wide clinical spectrum associated with SOX5 pathogenic alterations

Excluding the four patients with VUS, our patient series comprised 20 females and 17 males (Supplementary Table 1). The patients were 12.2 years of age on average at the time of examination (median: 8.0 years, range: 1.75–36), with 11 older than 15 and six younger than 4.

For most patients, pregnancy and delivery were unremarkable (21/36), birth measurements (weight, length, and head circumference) normal (15/18 for whom full information was available), and the neonatal period uneventful (25/36). Eight patients had mild growth retardation or a small head at birth, two were hypotonic, and three had feeding difficulties.

Developmental delay was present in all patients for whom information was available. Although more than half of the patients timely acquired the sitting position (≤9 months; n = 16/29), the age of walking was delayed in all but one (>18 months; n = 35/36), without clear timing differences among variant categories (Fig. 2a, b). The age of first words was delayed in 21/26 patients (>12 months; mean: 29.9 months, range: 10–60 months). The delay was significantly less pronounced in patients with missense variants (mean: 22.4 months, n = 11) than in those with deletions and truncating variants (mean: 35.2 months, n = 15; p value: 0.04, Wilcoxon rank sum test) (Fig. 2c). The levels of verbal expression were variable, but most patients older than three years could make short or full sentences (Supplementary Tables 1 and 2).

Fig. 2
figure 2

Patients exhibit similar clinical features regardless of theSOX5alteration type. Box plots showing comparative distribution of ages at (a) sitting unsupported, (b) walking unsupported, and (c) first words for patients with deletion, truncating, and missense variants. (d) Number of patients with normal to borderline cognitive abilities and various degrees of intellectual disability (ID). (e) Number of patients with autism spectrum disorder (ASD) or other behavioral disturbances. (f) Number of patients with seizures. (g) Facial profiles of individuals with de novo SOX5 variants. Above: D2 at age 10 years; D6 at age 26 years; D8 at age 24 years. Center: P1 at age 2 years, and his mother (41 years old); P6 at age 19 years; P10 at age 8 years. Below: P13 at ages 2 years, 6 months, and 11 years, 4 months, respectively; P14 at ages 2 years, 4 months and 8 years, respectively; P25 at age 4 years; P28 at age 5 years. Common facial features include broad or full nasal tip, thin upper lip and/or full lower lips, small jaw or prominent chin, prominent upper incisors and epicanthus.

Intellectual disability (ID) was reported in 30/33 patients, with 27 having mild-to-moderate ID and 3 having severe or moderate-to-severe ID. The three patients without ID had learning difficulties and either borderline functioning or discrepant verbal/performance IQ scores. No significant correlation was observed between degree of ID and variant type (Fig. 2d).

Of 25 patients evaluated for autism spectrum disorder (ASD), 6 (4 with truncating variants and 2 with missense variants) were positively diagnosed (24%) and 11 had other behavioral disturbances including stereotypies, isolation, tantrums, and hyperactivity (Fig. 2e). Of 36 patients, 8 experienced epileptic seizures (22%), but 5 of these had only one or two episodes and did not require medication. One of these patients (D6) had seizures triggered by environmental photosensitivity, an unusual finding in a “developmental delay plus seizures” syndrome (Supplementary Fig. 1). No correlation was found between the occurrence of seizures and the SOX5 variant type (Fig. 2f).

Clinical examination revealed that stature and weight were within normal range for most patients. Head circumference of both males (n = 14) and females (n = 15) was in the low but normal range (~−1.5 SD) while two patients (P14 and P23) had microcephaly. Hypotonia was reported in 22 patients, and five had additional neurological features, including ataxia (n = 2) or pyramidal syndrome (n = 3). Thirty-one patients had mild dysmorphic facial features, including broad/full nasal tip (n = 9), thin upper lip or full lips (n = 8), small jaw or chin (n = 5), long face (n = 3), or epicanthus (n = 3). Strabismus was reported in 13 patients, optic atrophy in 5, and amblyopia or cortical visual impairment in 1 each. Except for thin optic nerves, brain magnetic resonance image (MRI) scans were normal or showed nonspecific anomalies. Besides dysmorphic facial features, other skeletal malformations included scoliosis in six patients, thoracic kyphosis and hip dysplasia in one patient each, and fused cervical vertebrae in two patients (Supplementary Table 1). Malformations of other organs were rare and restricted to individual patients. Again, no correlation was found between the occurrence of these features and the variant types. Moreover, patients with recurrent variants (e.g., P2–P3: p.Gln208*, P4-P6: p.Arg213*, P16–P17: p.Met560Val, and P19–P23: p.Arg571Trp) exhibited considerable clinical variability, indicating that factors other than theSOX5 variants modulate the expression of the clinical phenotype.

SOX5 is tightly conserved in the general population

We used gnomAD, a genomic database for over 140,000 individuals who are theoretically unrelated and lacking severe pediatric disease, to investigate conservation constraints on SOX5 in humans.37 While 158 synonymous variants were predicted and 159 were observed (Z- score: −0.08), 42 loss-of-function variants were expected, but only 3 were observed (probability of loss-of-function intolerance [pLI] = 1). Moreover, 427 missense variants were predicted, but only 244 were observed (Z-score: 3.21). Thus, SOX5 is under tight conservation constraint in control populations. Interestingly, gnomAD synonymous variants were found for 10–29% residues both within and outside functional domains, whereas missense variants altered significantly fewer residues in the HMG domain (six residues, i.e., 7.5%) than in other regions (21–33%) and significantly fewer than synonymous variants (20%,p = 0.017) (Fig. 3a, b). The SOX5 HMG domain is thus highly constrained within control populations, which is in contrast to the relatively high prevalence of HMG domain missense variants observed in our patient cohort. The first coiled-coil domain also had significantly fewer missense variants (20.7%) than the regions outside of known functional domains (33.2%; p = 0.03), suggesting that this domain, which is required for SOX5 homodimerization and thereby for binding to pairs of recognition sites in target genes, is also under conservation constraint.

Fig. 3
figure 3

HumanSOX5is under tight conservation constraint. (a) Distribution of synonymous and missense variants in SOX5 in gnomAD individuals.CC, coiled coil. (b) Percentages of residues carrying at least one synonymous or missense variant in the functional and other domains of SOX5 in gnomAD individuals. T-tests were performed to calculate the statistical significance of differences between protein domains. P values are indicated. (c) Alignment of all human SOX protein HMG domain sequences, with indication of residues altered in Lamb–Shaffer syndrome (LAMSHF) patients (red) and altered only in gnomAD individuals (purple). Asterisks, fully conserved residues. Dots, semiconserved residues. Colored triangles, residues important for DNA binding and bending. Brackets, H1, H2, and H3 α-helices. Continued lines linked with dotted lines, key amino acids in nuclear localization signal sequences (NLS) and nuclear export signal sequence (NES). (d) Alignment of human SOXD protein sequences outside the HMG domain that encompass residues altered in LAMSHF patients.

The six HMG domain missense variants found in gnomAD affected two of the same residues as in LAMSHF patients and four others, and all six occurred only once (Supplementary Table 5). In contrast, for missense variants located outside the HMG domain, we found four occurrences of the patient Arg235Cys variant (located in the first coiled coil) in gnomAD, and one for Ser693Leu. Other gnomAD variants affected the same residues as in patients, such as Arg235His, found in 11 individuals. These observations suggest that some SOX5 variants, especially those located outside the HMG domain, may be better tolerated than others.

In silico prediction of variant pathogenicity

To predict pathogenicity of SOX5 missense variants, we first examined the location and conservation of affected residues. Since all HMG domain residues are fully or semiconserved in SOX5 vertebrate orthologs (Supplementary Fig. 2a), we focused on human SOX protein paralogs. All HMG domain residues altered in patients and gnomAD individuals affected residues involved in DNA binding or bending, α-helical configuration, or nuclear trafficking (Fig. 3c). Interestingly, 3 of the 5 residues altered in patients (Met560, Asn561 and Arg571) were among 23 residues identical in all protein paralogs, Tyr605 was among 13 semiconserved residues, and only Ala596 was among the 40 nonconserved residues. Conversely, only two of the six residues altered in gnomAD individuals were among the conserved and semiconserved ones. Outside the HMG domain, patient variants affected residues that are highly conserved in SOX5 and its orthologs (Supplementary Fig. 2b). When the comparison was limited to human SOXD proteins (SOX5, SOX6, and SOX13), these conservation patterns held strongly for Arg235Cys, located in the first coiled-coil domain, and Thr632Asn, immediately flanking the HMG domain, but less strongly for residues located in functionally unknown regions (Fig. 3d). Together, these data suggested that all HMG domain variants and a few other patient variants might impact SOX5 function.

We then asked whether the HMG domain residues altered in LAMSHF patients also cause disease when altered in other SOX genes. Interestingly, all residues affected in LAMSHF patients were shown to cause gonadal dysgenesis or XY sex reversal when altered in SRY, or campomelic dysplasia with or without XY sex reversal when altered in SOX9 (Supplementary Table 6). In contrast, only two of the four variants found in gnomAD, but not in LAMSHF patients, were shown to cause disease when altered in SRY. These data further support pathogenicity of patient variants. They also suggest that some variants present in gnomAD individuals could be pathogenic, but clinical information was unavailable to validate this possibility.

Lastly, comparison of WT and variant residues using HOPE (Supplementary Fig. 3) showed that all variants differed from WT residues by at least one major structural feature: 16/18 differed in size, 13/18 differed in hydrophobicity, and 6/6 had a neutral instead of positive charge. All variants could thus affect the secondary structure and hence function of SOX5.

Overall, these analyses concurred that most missense variants identified in our patient series are likely pathogenic.

Truncating variants and missense variants located within or near nuclear import signals impair SOX5 translocation to the nucleus

We constructed expression plasmids for WT and variant forms of L-SOX5 and transiently transfected them in HEK-293 cells to explore the functional impacts of variants. Western blots of nuclear and cytoplasmic fractions (Fig. 4a) and cell immunostaining assays (Fig. 4b) showed that WT SOX5 localized primarily in the nucleus, as expected. On the contrary, expression of nonsense variants (Gln208*, Gln274*, Gly354*, and Arg493*) revealed that, if these variants were expressed in patients’ cells (i.e., if their mRNAs were not subjected to nonsense-mediated decay), they would be primarily cytoplasmic. This result was expected since protein truncation occurs before the nuclear translocation signals. All proteins with a missense variant that we tested were able to translocate into the nucleus, except those in which the variant occurred within or near the N-terminal nuclear import signal. Accordingly, the Met560Val variant was localized to both the cytoplasm and nucleus, and the Asn561His and Arg571Trp variants were mainly cytoplasmic. Cytoplasmic retention of these missense variants may thus contribute to pathogenicity.

Fig. 4
figure 4

Subcellular localization and activities of SOX5 variants. (a) Western blots of cytoplasmic (C) and nuclear (N) extracts from HEK-293 cells transfected with plasmids encoding no protein (-), wild-type SOX5 (WT), or SOX5 variants. Blots were incubated with SOX5 antibody. Red boxes, SOX5-specific protein signals. Numbers, Mr of protein standards. (b) Representative images of SOX5 immunostaining (green signal) in HEK-293 cells transfected with plasmids encoding wild-type SOX5 (WT) or the indicated variants. Nuclei are seen in blue and plasma membranes in red. Scale bars: 20 μm. (c) Test of the abilities of SOX5 variants to synergize with SOX9 in transactivation. HEK-293 cells were transfected with Acan and pSV2βGal reporter plasmids and plasmids encoding no protein, SOX9, and/or SOX5. The WT SOX5 plasmid was used in the indicated amounts, and the variant plasmids at 150 ng. Reporter activities are presented as the mean ± standard deviation obtained for triplicates in one representative experiment. They were normalized for transfection efficiency and are reported as increase over the activity of SOX9 alone. (d) Test of the abilities of SOX5 variants to interfere with WT SOX5 in transactivation. HEK-293 cells were transfected essentially as described above. SOX5 variant plasmids were tested at 150 ng with 150 ng SOX5 WT plasmid. Reporter activities were calculated and are presented as described above. (e) Test of the abilities of SOX5 variants to bind DNA in electrophoretic mobility shift assay (EMSA). Extracts from HEK-293 cells transfected with empty, WT SOX5, or SOX5 variant plasmid were incubated with a 2HMG DNA probe. Top, X-ray film images. SOX5/DNA complexes migrated more slowly than nonspecific protein (non-sp.)/DNA complexes. Bottom, western blot showing similar amounts of all SOX5 proteins. (f) Dimerization assay with the same extracts as in (c) for no protein, WT SOX5, and the R235C variant. Western blots were performed using SOX5 antibody. SOX5 dimers ran in sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) with an apparent Mr twice as large as that of monomers.

Missense variants in the HMG domain prevent SOX5 from participating in transactivation

We tested the transcriptional activity of SOX5 variants by transfecting HEK-293 cells with an Acan reporter whose enhancer is synergistically activated by SOX9 and SOXD proteins.35 WT SOX5 increased transactivation by SOX9 in a dose-dependent manner (Fig. 4c). Nonsense and HMG domain missense variants exhibited little if any activity, whereas missense variants located outside the HMG domain had activity similar to WT. Since SOX5 variants are heterozygous in our patients, we also tested whether they could interfere with the activity of WT SOX5. Nonsense and HMG domain missense variants did not affect the activity of WT SOX5, and missense variants located outside the HMG domain increased the reporter activity as much as WT SOX5 (Fig. 4d). Thus, none of the variants showed a dominant-negative effect.

We then tested the DNA-binding ability of SOX5 missense variants in EMSA using whole-cell extracts from HEK-293 cells transfected with SOX5 plasmids and a probe avidly binding SOXD homodimers.38 HMG domain missense variants failed to bind DNA, whereas other missense variants efficiently bound DNA (Fig. 4e). This result also suggested that Arg235Cys, located in the main coiled-coil domain, can homodimerize effectively. Its ability to homodimerize was confirmed in an assay where closely interacting proteins were crosslinked with glutaraldehyde (Fig. 4f).

In conclusion, HMG domain missense variants prevented SOX5 from binding DNA and from participating in transcriptional activation, supporting their pathogenicity. On the contrary, variants located outside the HMG domain had no deleterious impact in the assays used, but this finding does not rule out that they could be pathogenic and alter other, untested SOX5 activities.

DISCUSSION

LAMSHF syndrome was previously described in just over two dozen patients. Most patients had deletions of at least part of SOX5, and a few had either a chromosomal translocation involvingSOX5, or SOX5 nonsense or frameshift variants.9,10,11,12,19,20,21 Our patient series more than doubles the number of cases described in the literature and demonstrates that SOX5 missense variants clustering in the HMG domain can also cause LAMSHF syndrome. All variants were heterozygous, and most were predicted in silico and validated in vitro to be loss-of-function variants. This confirms thatSOX5 haploinsufficiency is deleterious for neurogenesis and a few other developmental processes. Our study also revealed that parental mosaicism, found in at least 14% of families in our series, is relatively frequent in LAMSHF syndrome. This finding is important for genetic counseling and in line with increasing evidence that somatic, gonosomal, or gonadal mosaicism in parents may cause recurrence of neurodevelopmental disorders, apparently due to de novo variants.39SOX5 and LAMSHF syndrome thus expand the list of such genes and disorders.

Our extended study allowed further definition of the LAMSHF clinical features. ID is mostly within the mild-to-moderate range, and some cases have specific cognitive deficits rather than ID.9 Delays in motor and language acquisition are observed in all patients and correlate with the level of ID. Behavioral disturbances are frequent and include ASD or autistic traits, as previously reported.9,10,40 Microcephaly is infrequent; yet, brain growth seems frequently mildly altered. Hypotonia is common, whereas other neurological features are infrequent. Our findings also suggest that SOX5 pathogenic variants predispose to epilepsy, with a prevalence of an order of magnitude higher than in the general population. Seizures inSOX5 patients usually respond well to antiepileptic treatments and follow a benign course. Ophthalmologic features, including strabismus, optic nerve atrophy, amblyopia, and cortical visual impairment, are frequently observed9,19,22 and, together with rare skeletal malformations (i.e., scoliosis and fused cervical vertebrae), constitute corroborating rather than defining features of LAMSHF syndrome.9 The incomplete penetrance observed for some features suggests that SOX5 haploinsufficiency manifests differently in distinct individual genetic backgrounds or that some variants retain partial activity. The investigation of clinical features according to variant types, however, did not reveal clear genotype–phenotype correlations. Patients with HMG domain missense variants tended to have milder language deficits, but this finding requires confirmation with larger patient cohorts. Based on the lack of obvious genotype–phenotype correlations and on the observation of variable phenotype severity in unrelated individuals with identical SOX5 variants, we tentatively conclude that yet-unidentified factors significantly contribute to the penetrance and degree of disease severity.

We also describe in this study four patients with de novo variants located outside the HMG domain and altering amino acids conserved in SOX5 orthologs. However, the pathogenicity of these variants could not be established through functional assays, and it thus remains unclear whether and how these variants contribute to disease in these patients. Three of these patients (V2–V4) had phenotypic features compatible with LAMSHF syndrome (although patient V2 was very young at the time of the study and patient V3 mainly had ASD), whereas the fourth patient (V1) had Tourette syndrome. The variant identified in the latter patient (Arg235Cys) was also present in four gnomAD individuals from different ethnicities. Although Tourette patients are included in gnomAD “neuro” cohorts, the individuals with Arg235Cys were not in these cohorts, suggesting that these individuals had no obvious neurological phenotype. Further investigations are therefore warranted to investigate whether missense variants outside the HMG domain could impair untested activities of SOX5 and whether these variants could predispose to LAMSHF or Tourette syndrome.

In conclusion, our study demonstrates that the genetic and clinical spectrum in LAMSHF syndrome is much larger than previously described, and extends to missense variants clustering in the HMG domain. In silico and in vitro functional data support the concept that these missense variants are pathogenic by causing loss of function of the SOX5 transcription factor, and thereby reflect gene haploinsufficiency during neurogenesis and occasionally during other developmental processes. The impacts of variants located outside the HMG domain remain to be determined.