Novel Variance-Component TWAS method for studying complex human diseases with applications to Alzheimer’s dementia

Shizhen Tang; Aron S. Buchman; Philip L. De Jager; David A. Bennett; Michael P. Epstein; Jingjing Yang

doi:10.1101/2020.05.26.117515

Abstract

Transcriptome-wide association studies (TWAS) have been widely used to integrate transcriptomic and genetic data to study complex human diseases. Within a test dataset lacking transcriptomic data, existing TWAS methods first impute gene expression by creating a weighted sum that aggregates SNPs with their corresponding cis-eQTL effects on reference transcriptome. Existing TWAS methods then employ a linear regression model to assess the association between imputed gene expression and test phenotype, thereby assuming the effect of a cis-eQTL SNP on test phenotype is a linear function of the eQTL’s estimated effect on reference transcriptome. To increase TWAS robustness to this assumption, we propose a novel Variance-Component TWAS procedure (VC-TWAS) that assumes the effects of cis-eQTL SNPs on phenotype are random (with variance proportional to corresponding reference cis-eQTL effects) rather than fixed. VC-TWAS is applicable to both continuous and dichotomous phenotypes, as well as individual-level and summary-level GWAS data. Using simulated data, we show VC-TWAS is more powerful than traditional TWAS especially when eQTL genetic effects on test phenotype are no longer a linear function of their eQTL genetic effects on reference transcriptome. We further applied VC-TWAS to both individual-level (N=∼3.4K) and summary-level (N=∼54K) GWAS data to study Alzheimer’s dementia (AD). With the individual-level data, we detected 13 significant risk genes including 6 known GWAS risk genes such as TOMM40 that were missed by existing TWAS methods. With the summary-level data, we detected 57 significant risk genes considering only cis-SNPs and 71 significant genes considering both cis- and trans- SNPs; these findings also validated our findings with the individual-level GWAS data. Our VC-TWAS method is implemented in the TIGAR tool for public use.

Author Summary Existing Transcriptome-wide association studies (TWAS) tools make strong assumptions about the relationships among genetic variants, transcriptome, and phenotype that may be violated in practice, thereby substantially reducing the power. Here, we propose a novel variance-component TWAS method (VC-TWAS) that relaxes these assumptions and can be implemented with both individual-level and summary-level GWAS data. Our simulation studies showed that VC-TWAS achieved higher power compared to existing TWAS methods when the underlying assumptions required by existing TWAS tools were violated. We further applied VC-TWAS to both individual-level (N=∼3.4K) and summary-level (N=∼54K) GWAS data to study Alzheimer’s dementia (AD). With individual-level data, we detected 13 significant risk genes including 6 known GWAS risk genes such as TOMM40 that were missed by existing TWAS methods. Interestingly, 5 of these genes were shown to possess significant pleiotropic effects on AD pathology phenotypes, revealing possible biological mechanisms. With summary-level data of a larger sample size, we detected 57 significant risk genes considering only cis-SNPs and 71 significant genes considering both cis- and trans- SNPs, which also validated our findings with the individual-level GWAS data. In conclusion, VC-TWAS provides an important analytic tool for identifying risk genes whose effects on phenotypes might be mediated through transcriptomes.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

1.We derived the VC-TWAS approach for using only summary-level GWAS data (see our added subsection in the Methods section with details provided in the supplement text). 2.We conducted simulation studies with different test sample size to show the power performance of VC-TWAS with respect to test sample sizes. 3.We applied VC-TWAS to the IGAP GWAS summary statistics of AD (with sample size ~54K). We detected 57 significant risk genes considering only cis-SNPs, including all the genes identified by using the individual-level GWAS data of ROS/MAP and Mayo Clinic cohorts with samples size ~3.3K. We also detected 71 significant genes by considering both cis- and trans- SNPs. 4.We cited and discussed the existing method CoMM in our introduction part and compared its performance with VC-TWAS by simulation studies. 5.We moved some technical details of the method and data descriptions into Supplement text.
https://github.com/yanglab-emory/TIGAR
https://github.com/hakyim/PrediXcan
http://www.radc.rush.edu/
https://www.synapse.org/#!Synapse:syn3219045
https://www.synapse.org/#!Synapse:syn2910256
https://web.pasteur-lille.fr/en/recherche/u744/igap/igap_download.php
https://www.synapse.org/#!Synapse:syn22316792