Introduction

Schizophrenia (SZ) and bipolar disorder (BD) are heritable brain disorders involving impairments in cognition, perception, and motivation that usually manifest late in adolescence or in early adulthood [1, 2]. The pathogenic mechanisms of these brain disorders are unknown, but pathological features involving excessive loss of gray matter have been reported [3], as well as reduced numbers of synaptic structures on neurons [4]. Treatments exist for the psychotic symptoms but no preventive or curative therapies are available [1]. So far, GWAS and family linkage studies have not provided breakthroughs in the illness genomic underpinnings that would be useful to derive therapies, despite genes being undisputedly involved in etiology [5]. New ways of approaching genomic studies and new theories must thus be imagined to decipher the pathophysiology [6, 7].

One major GWAS study has identified 108 loci in the human genome that contain single nucleotide polymorphism (SNP) haplotypes associated with SZ [8]. However, the functional alleles and mechanisms at these loci remain to be elucidated. The identification of genetic cis associations between SNPs and gene expression may help to find functionally relevant pathways that contribute to the etiology of psychiatric disorders [9]. Despite controversies, RNA and protein expression profiles from peripheral blood would present patterns that reflect those found in brain tissue [10,11,12,12], and gene expression of peripheral blood lymphocytes may discriminate SZ patients [13,14,15,16] as BD patients [17, 18] from controls. Gene expression has also been used in attempts to pinpoint the genes most likely to be the source of genetic signals. For instance, Fromer et al. [19] sequenced RNA from dorsolateral prefrontal cortex in SZ patients and found 693 genes showing differential expression of which the genes FURIN, TSNARE1, CNTN4, CLCN3, and SNAP91 were among the 108 GWAS loci identified previously [8]. Also, Lopez de Lara et al. have successfully combined linkage and expression studies performed in the same set of families [20]. Indeed, they screened 26 BD families with 811 microsatellite markers to identify the linkage regions in which gene expression could identify candidate genes. They found evidence of linkage to lithium-responsive BD on 3p25, 3p14, and 14q11 as well as dysregulated gene expression in these regions suggesting altered synaptic and mitochondrial function [20].

Altered DNA methylation has also been observed in SZ [21] in peripheral leukocytes [22]. For instance, methylation-wide analysis has successfully refined results from GWAS with the top three methylation findings located near genes SDCCAG8, CREB1, and ATXN7 [23]. Analysis of methylation also helped to refine functional/regulatory variations associated with SZ polygenic risk scores and genetic variants in it [24]. With methods combining evaluation of DNA methylation and gene expression levels from whole blood of 260 SZ patients and 250 controls, van Eijk et al. [25] identified 432 CpG sites with differential methylation levels that were also associated with differential gene expression. These findings suggest that inspecting concurrently gene expression and methylation might improve interpretation of findings.

We have already reported significant linkage results in SZ, BD, and both diseases in 12 chromosomal regions from 9 different chromosomes [26,27,28]. Since then, we have added markers and subjects to the 2005 sample [26] and have reanalyzed the data ending up with very similar updated signals in nine regions from eight different chromosomes (including six of the original regions) as presented in Supplementary Table S1. In all, the latter nine linked regions included some 2051 genes that were all candidates by position.

Herein, our objective was to follow up of these linkage findings and assess the likelihood that some of these 2051 genes be involved in SZ or BD, or in both (CL, common loci). First, as depicted in Supplementary Fig. S1, our multimodal follow-up involved differential gene RNA expression. Second, we selected genes with significant differences in RNA expression for a cis-eQTN (expression Quantitative Trait Nucleotide) analysis to identify SNP that could potentially regulate gene expression. Third, genes showing significant differential RNA expression and/or cis-eQTN analyses were investigated for DNA methylation in relation to gene expression, DNA variations, and phenotypic status. In each region the analysis was restricted to the phenotype yielding the strongest linkage signal. Among the genes in the nine linkage regions, our experimental procedure yielded the best multimodal convergence for ITGB5, SPCS3, and FZD3 located, respectively, on chromosomes 3q21, 4p34, and 8p21.

Materials and methods

Phenotype definition

A best-estimate lifetime DSM-IV diagnosis was made as outlined in previous reports [29,30,31]. In brief, all available information across lifetime from different sources (all medical records, family informant interviews, personal structured interview) was reviewed blindly by four research diagnosticians. To test our hypothesis, we also defined a “common locus” (CL) phenotype. The CL phenotype included BD, SZ, and the schizoaffective disorder (SAD). Our SZ definition included SZ plus schizophreniform disorder and schizotypal personality. The BD phenotype included both BD I and BD II, and recurrent major depression.

Gene RNA expression measurement

Sample

We analyzed family members from 29 of the original 48 kindreds in Maziade et al. [28], including 17 kindreds showing linkage signals with one or more of the nine linked chromosomal regions giving preference to the families with the greatest number of subjects. This represented 498 subjects: 156 cases affected by SZ (N = 58), SAD (N = 13), or BD (N = 85) and 342 non-affected relatives (NAR) (see Supplementary 1).

RNA expression in immortalized lymphocytes (IML)

The detailed protocol is reported in Supplementary 1. Briefly, RNA harvested from IML in culture for the 498 subjects was hybridized to HumanHT-12 Expression Beadchip v4 (47 K probes; Illumina; Génome Quebec). IML cultures were monitored to be harvested in exponential growth. RNA quality (RNA integrity number or RIN) was evaluated on a Bioanalyser with RNA 6000 Nano Kit (Agilent) and RIN values close to ten were generally observed.

Signal extraction from probe intensities

Probe intensities were background corrected and quantile normalized across all chips. Normalized intensities were then log2 transformed, and called log2-expression values from now on. A probe was declared expressed in a subject if the p value of the Illumina detection test was <0.05. We retained for analysis all probes that were expressed in 75% or more of the subjects from at least one family where at least four members had been measured, to account for the possibility of family-specific expression. Additional details are provided in Supplementary 1.

SNP genotyping

Whole-genome SNP genotyping was made using DNA extracted from IMLs by affinity column (Midi prep Qiagen) for the 498 subjects using Infinium Human OmniExpress24 Illumina microchips (700 K; Genome Quebec). Some SNPs were completed for the remaining subjects of the cohort using TaqMan kit (Life Technologies) and a real time PCR (LightCycler LC480, Roche) where some 10% blind replicates were added for genotyping quality control. Mendelian inheritance within families was also checked. In the case where more than 5% of non-replication or more than five non-Mendelian inheritance is observed, the marker was genotyped again completely after improving genotyping conditions (salt and DNA contents, reaction volume, annealing temperature).

Expression quantitative trait nucleotides (eQTNs) analysis

For the genes listed in Table 1, we evaluated if proximal SNPs within 50 kb of the gene were associated through the RNA expression of this gene (cis-eQTN) to the phenotype identified through linkage in its corresponding chromosomal region. See statistical analysis below for details.

Table 1 Most significant mean RNA expression ratio (MER) observed between affected and unaffected-relative subjects for the phenotype specific to each linked region

CpG analysis

We contrasted DNA methylation in DNA extracted from IMLs between NAR, SZ, and BD subjects according to the level of expression of the selected genes. For the validation run of the chosen CpGs, we selected nine subjects (three NAR, three SZ, and three BD) to evaluate the level of methylation. If DNA methylation was higher than 10% with a good quality call, we selected the first 10 subjects with the lowest, median and highest expression for each NAR, SZ, and BD groups according to adjusted expression values, taking care to avoid overlap between expression categories among phenotypic status. CpG islands were identified [32] and CpG sites selected for analysis. For ITGB5, we analyzed ten CpG sites within a 1.5 Kb region including five CpG islands located at 15 Kb from the 5′ end of ITGB5 (Isl 1), 5 CpGs located in a CpG island located in the first 5 Kb of the gene (Isl 2) and the CpG created by the SNP rs10934702:C > T, NC_000003.11:g.124653868 C > T (see Fig. 1a). For SPCS3, 13 CpG sites located in a CpG island of 721 bp located in the first 1 Kb of the 5′ region of the gene (Isl 1), four CpGs located in intron 3 (Isl 2), four CpGs located in the 3′ untranslated region of the gene (Isl 3), and the CpG created by the SNP rs7694145:A > G, NC_000004.11:g.177285894 A > G (see Fig. 1b) were evaluated in 24 SZ cases, 22 BD cases, and 126 NARs. Finally, for FZD3 three CpG sites located within 3, 22, and 31 nucleotides 5′ of the SNP rs1946583:T > C, NC_000008.10:g.28350753 T > C were analyzed (see Fig. 1c). Initial promising results led us to measure FZD3 methylation in all 58 SZ and 85 BD cases, and in a sample of 199 NARs enriched with subjects having low level of expression. After DNA was treated with bisulfite (EpiTect Fast DNA Bisulfite Kit, Qiagen), 10 ng of cleaned bisulfited DNA was used for PCR followed by DNA pyrosequencing (Pyromark MD, Qiagen). Primers used and sequences analyzed after bisulfite treatment can be found online in Supplementary 2. A biotin was added 5′ of the reverse primers to allow capture of a single DNA strand for pyrosequencing.

Fig. 1
figure 1

Genomic structure of the genes highlighted in the results section. a ITGB5, b SPCS3, and c FZD3. CpG islands for which methylation was measured are indicated in dark yellow. Single nucleotide polymorphisms (SNPs) are indicated by their reference sequence number. a, b SNPs marked with an asterisk create a CpG site; c SNPs marked with an asterisk showed association with SZ in a Chinese sample [37] (Color figure online)

Statistical analysis

Comparison of gene RNA expression levels between SZ/BD and NAR (analysis 1)

We compared the RNA expression levels of the 156 CL affected subjects to the 342 NARs, and of the 58 SZ and 85 BD subjects to the 342 NAR (after excluding SAD subjects) in separate linear mixed effects models of the log2-expression values of each probe, age at blood draw and sex included as fixed covariates and the Illumina slide and a polygenic effect as random effects to deal with kinship, in addition to the first 10 principal components of 51 housekeeping probes as fixed covariates to control unwanted variation [33]. Mean expression ratios (MERs) were computed as 2b, where b is the estimated coefficient for the phenotype group tested (CL, SZ, or BD). Wald tests were performed on these coefficients, and a Bonferroni correction was applied by multiplying the p values by the number of probes in the linked region.

eQTN analysis (analysis 2)

This analysis involved the same CL, SZ, BD, and NAR samples as the comparison of RNA expression levels above. We used the method of Zhao et al. [34] comprising (1) an outcome model, which is a logistic regression of phenotypic status (either CL, SZ, or BD against NAR) on the log2-expression values of the probe or probes in the gene or genes located near the eQTN, adjusted for age at blood draw, sex and the first 10 principal components of the housekeeping probes, and (2) a transcript model that is a linear regression between the linear predictor of the outcome model and the number of minor alleles of the eQTN. Since the linear predictor of the outcome model captures the association of the gene(s) expression with phenotype, the Wald test of the eQTN coefficient in the transcript model is a test of the eQTN association with phenotype mediated through gene expression. A Bonferroni correction was applied by multiplying the p values by the number of SNPs remaining in each gene after pruning the SNPs on the array so that no r2 exceeds 0.8 in windows of 20 SNPs using Plink (www.cog-genomics.org/plink2).

Comparison of CpGs methylation levels between SZ/BD and NAR (analysis 3)

For FZD3, where we measured methylation in a biased NAR sample, an unbiased estimate of the mean methylation level in NARs was obtained by inverse probability-of-selection weighting. We tested for differences in methylation levels between SZ, BD, and NARs using t-tests, accounting for subject weights in the computation of the standard errors in the denominator of the t statistics.

Ethics and data availability statements

The study was explained to each family member and a signed consent was obtained from the subjects or from a parent for children under 18, as reviewed by the Mental Health and Neuroscience Ethics Review Board of the Centre intégré universitaire de santé et des services sociaux de la Capitale-Nationale (reference numbers MP-13–1996–1, 068–1995).

The RNA expression dataset generated and analysed during the current study is available in the ArrayExpress repository (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8018). The SNP genotype dataset generated and analysed during the current study is not publicly available due to absence of study participants consent to publication of their personal information including genotype, but is available from the corresponding author on reasonable request and establishment of a data sharing agreement.

Results

Differentially RNA expressed genes found in several linkage regions

For the phenotype yielding the greatest linkage signal in each chromosomal region (see Supplementary Table S1), Table 1 reports the differentially expressed genes showing the smallest p value of the chromosomal region. Among these genes (see Supplementary Table S2), integrin beta 5 (ITGB5), transcript variant 3, remained significant at the 5% level after adjusting for multiple testing in the 3q21 region (N = 201 probes; p = 0.000085, adjusted p = 0.017), and near significant if adjusted for all probes tested in all regions (N = 686; adjusted p = 0.058). Three other genes, ataxin 1 (ATXN1) at 6p22, golgin A7 (GOLGA7) at 8p11 and the uncharacterized transcript HS.121655 at 13q11 showed suggestive results with adjusted p values below p = 0.15.

eQTNs detected in a few differentially RNA expressed genes

Again for the phenotype yielding the greatest linkage signal in each region, Table 2 presents SNPs associated to the phenotype through the RNA expression of genes with nominally significant MER in analysis 1 using the method of Zhao et al. [34] (analysis 2). Complete results for all SNPs located within the linkage regions and all phenotypes are reported in Supplemental Tables S3–S5. This time, significant or suggestive associations after adjusting for multiple SNPs testing within 50 Kb of each gene were observed for the gene signal peptidase complex subunit 3 (SPCS3) located at 4q32 (adjusted p = 0.027), frizzled class receptor 3 (FZD3) located at 8p21 (adjusted p = 0.052), and ITGB5 at 3q21 (adjusted p = 0.077). We elected to pursue the analysis of these three genes for DNA methylation.

Table 2 eQTN for genes showing the strongest evidence of differential expression between affected and unaffected-relatives subjects for the phenotype specific to each linked region

No evidence of DNA methylation-phenotype association

Three genes were evaluated for DNA methylation. First, for ITGB5, the level of methylation for the CpGs of ILS1 site (Fig. 1a) was low (0% to 10%) and similar across phenotypes (Table 3A) and their analysis was not pursued. For the ISL2 ITGB5 site located closer to the 5′ end of the gene (Fig. 1a), we found higher levels of methylation with variations ranging from 40% to 100% (Table 3A). However, we detected no obvious DNA methylation differences between CL and NAR subjects. Finally, we observed for rs10934702:C > T, NC_000003.11:g.124653868 C > T around 40–45% and 85–90% methylation in the heterozygotes CT and the homozygotes CC, respectively, with no obvious differences between phenotypes (Table 3A). We concluded that these CpGs were not involved in differential regulation of ITGB5 expression between the CL and NAR phenotypes.

Table 3 Mean DNA methylation (%) observed for different CpGs island (Isl) located in ITGB5 (A), SPSC3 (B), and FZD3 (C) genes for NAR, BD, and SZ subjects

Second, in the case of SPCS3, the first investigated site (Fig. 1b) showed DNA methylation levels of <10% with poor quality calls (Table 3B) and was not pursued. The second SPCS3 CpG site showed very high level of methylation (75–100%) but with no obvious differences between groups, while the third SPCS3 CpG site showed moderate methylation ranging between 10 to 40% that was also similar between groups (Table 3). CpG created by the SNP rs7694145:A > G, NC_000004.11:g.177285894 A > G present variation in methylation from 0 to 70% with a greater methylation in NAR carrying the AG genotype than in BD (p = 0.004; Table 3). The greater methylation observed in NAR corresponded to a lower SPCS3 expression. However, a lower expression of SPCS3 was also observed in AA individuals where no methylation could be observed due to the absence of the CpG site with that genotype (results not shown).

Third, for FZD3, the three CpG sites exhibited a wide range of methylation levels (Table 3C). There was no significant difference in mean methylation level between BD and NARs (p = 0.89) and between SZ and NARs (p = 0.15) using analysis 3.

Discussion

Our present results build upon several complementary methods to follow-up on our linkage signals previously reported in a large kindred sample. This multimodal procedure has yielded information on association of SNPs with phenotypic status through RNA expression level of genes to approach the disease complexities. Supplementary Table S6 provides an overview of the positive and negative findings. For SPCS3, a greater RNA expression in BD patients than in their NARs appeared independent of the level of methylation at the SNP site. However, very few CpG sites were studied and a more exhaustive analysis using DNA methyl Seq, for example, could highlight functional CpG sites in relation to SZ/BD. We noted that none of the SNPs found associated with FZD3 or SPCS3 RNA expression (cis-eQTNs) would be significantly associated to the CL or BD phenotypes in a simple phenotype—genotype association analysis (results not shown). This highlights the value of the Zhao et al. [34] integrative genomics framework exploiting gene expression to detect disease susceptibility or protective variants when no clear genetic association is detected.

Our experimental approach detected a relatively low number of genes showing a significant difference in RNA expression between SZ or BD patients and their NARs. This contrasted with previous studies analyzing brain or whole blood in smaller or similar sizes of samples of subjects. Four limitations could explain our result. First, gene expression in leukocytes does not correlate perfectly with gene expression in brain cells involved in SZ and BD pathophysiology. We may have missed genes differentially expressed in the brain between patients and NARs but not expressed in leukocytes. Second, we have used cultured IMLs, in which gene expression should be relatively normalized by the similar and controlled culture conditions [35]. In that view, remaining differences between patients and NARs would then come from structural DNA variants and not epigenetic modifications [12]. However, important differences could have been attenuated by the homogeneous culture conditions where immediate environmental factors would have less or no influence to trigger or accentuate genetic effects. Third, we compared patients to NARs, not to unrelated controls. Patients and NARS share a large part of their genome including several gene RNA expression controlling elements, which might have attenuated differences in gene RNA expression. Fourth, our approach targeted solely variants modulating phenotype through gene RNA expression. It would thus have missed variants influencing SZ and BD risk through other routes. For instance, rs1156026:T > C, NC_000013.10:g.43726345 T > C on 13q13–q14, which previously showed an association with SZ in our same sample [36], did not come out in the present analysis because that SNP was not associated to the expression of any transcript present on the chip.

FZD3, a member of a family of genes that encodes a seven-transmembrane domain proteins that are receptors for the wingless type MMTV integration site family of signaling proteins, has already come out as a susceptibility locus for SZ in the Chinese population [37, 38], but only in this population according to a meta-analysis [39]. Congruently with this meta-analysis, when genotyping our sample for two of the SNPs associated with SZ in the Chinese sample (rs352203:C > T, NC_000008.10:g.28394701 C > T and rs2241802:A > G, NC_000008.10:g.28384712 A > G), neither of these SNPs was associated with SZ or to the RNA expression or the methylation of FZD3 in our sample (results not shown). However, FZD3 would be nonetheless implicated in SZ and BD based on molecular genetics and developmental studies of 484 annotated genes on chromosome 8p21 [40]. Knock-out animal models using CRISPER/CAS9 could be used to investigate a possible functional link between SZ/BD and FZD3.

In conclusion, despite the limitations explained above, our multimodal attempt focusing on dense SNP genotyping and RNA expression analyses to previously reported linkage findings, has the advantage of diminishing the multiple statistical testing burdens and allowed us to perform a deeper mining of candidate gene effects related to SZ or BD.