Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size

Abstract

An important issue affecting genome-wide association studies with deep phenotyping (multiple correlated phenotypes) is determining the suitable family-wise significance threshold. Straightforward family-wise correction (Bonferroni) of p < 0.05 for 4.3 million genotypes and 335 phenotypes would give a threshold of p < 3.46E−11. This would be too conservative because it assumes all tests are independent. The effective number of tests, both phenotypic and genotypic, must be adjusted for the correlations between them. Spectral decomposition of the phenotype matrix and LD-based correction of the number of tested SNPs are currently used to determine an effective number of tests. In this paper, we compare these calculated estimates with permutation-determined family-wise significance thresholds. Permutations are performed by shuffling individual IDs of the genotype vector for this dataset, to preserve correlation of phenotypes. Our results demonstrate that the permutation threshold is influenced by minor allele frequency (MAF) of the SNPs, and by the number of individuals tested. For the more common SNPs (MAF > 0.1), the permutation family-wise threshold was in close agreement with spectral decomposition methods. However, for less common SNPs (0.05 < MAF ≤ 0.1), the permutation threshold calculated over all SNPs was off by orders of magnitude. This applies to the number of individuals studied (here 777) but not to very much larger numbers. Based on these findings, we propose that the threshold to find a particular level of family-wise significance may need to be established using separate permutations of the actual data for several MAF bins.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Observed permutation p values as a function of minor allele frequency in 777 individuals and 335 correlated phenotypes.
Fig. 2: Family-wise Permutation p values as a function of minor allele frequency (MAF) and sample sizes.

Similar content being viewed by others

References

  1. Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–68.

    Article  CAS  Google Scholar 

  2. Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol. 2008;32:361–9.

    Article  Google Scholar 

  3. Cheverud JM. A simple correction for multiple comparisons in interval mapping genome scans. Heredity. 2001;87:52–8.

    Article  CAS  Google Scholar 

  4. Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004;74:765–9.

    Article  CAS  Google Scholar 

  5. Li J, Ji L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity. 2005;95:221–7.

    Article  CAS  Google Scholar 

  6. Li MX, Yeung JM, Cherny SS, Sham PC. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet. 2012;131:747–56.

    Article  CAS  Google Scholar 

  7. Pahl R, Schafer H. PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing. Bioinformatics. 2010;26:2093–100.

    Article  CAS  Google Scholar 

  8. Abney M. Permutation testing in the presence of polygenic variation. Genet Epidemiol. 2015;39:249–58.

    Article  Google Scholar 

  9. Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–71.

    Article  CAS  Google Scholar 

  10. Dudbridge F, Gusnanto A. Estimation of significance thresholds for genomewide association scans. Genet Epidemiol. 2008;32:227–34.

    Article  Google Scholar 

  11. Tabangin ME, Woo JG, Martin LJ. The effect of minor allele frequency on the likelihood of obtaining false positives. BMC Proc. 2009;3 (Suppl 7):S41.

    Article  Google Scholar 

  12. Hong EP, Park JW. Sample size and statistical power calculation in genetic association studies. Genom Inf. 2012;10:117–22.

    Article  Google Scholar 

  13. Gordon D, Finch SJ, Nothnagel M, Ott J. Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Human Heredity. 2002;54:22–33.

    Article  Google Scholar 

  14. Han B, Kang HM, Eskin E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 2009;5:e1000456.

    Article  Google Scholar 

  15. Tamminga CA, Ivleva EI, Keshavan MS, Pearlson GD, Clementz BA, Witte B, et al. Clinical phenotypes of psychosis in the bipolar-schizophrenia network on intermediate phenotypes (B-SNIP). Am J Psychiatry. 2013;170:1263–74.

    Article  Google Scholar 

  16. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    Article  CAS  Google Scholar 

  17. Fischl B. FreeSurfer. NeuroImage. 2012;62:774–81.

    Article  Google Scholar 

  18. Tamminga CA, Pearlson G, Keshavan M, Sweeney J, Clementz B, Thaker G. Bipolar and schizophrenia network for intermediate phenotypes: outcomes across the psychosis continuum. Schizophr Bull. 2014;40 (Suppl 2):S131–7.

    Article  Google Scholar 

  19. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–73.

    Article  CAS  Google Scholar 

  20. Sun L, Dimitromanolakis A. PREST-plus identifies pedigree errors and cryptic relatedness in the GAW18 sample using genome-wide SNP data. BMC Proc. 2014;8 (Suppl 1):S23.

    Article  Google Scholar 

  21. Alliey-Rodriguez N, Grey TA, Shafee R, Asif H, Lutz O, Bolo NR, et al. NRXN1 is associated with enlargement of the temporal horns of the lateral ventricles in psychosis. Transl Psychiatry. 2019;9:230.

    Article  Google Scholar 

  22. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529.

    Article  Google Scholar 

  23. Williams AL, Patterson N, Glessner J, Hakonarson H, Reich D. Phasing of many thousands of genotyped samples. Am J Hum Genet. 2012;91:238–51.

    Article  CAS  Google Scholar 

  24. Epstein MP, Duncan R, Jiang Y, Conneely KN, Allen AS, Satten GA. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am J Hum Genet. 2012;91:215–23.

    Article  CAS  Google Scholar 

  25. Liu Q, Nicolae DL, Chen LS. Marbled inflation from population structure in gene-based association studies with rare variants. Genet Epidemiol. 2013;37:286–92.

    Article  Google Scholar 

  26. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.

    Article  CAS  Google Scholar 

  27. Fodor AA, Tickle TL, Richardson C. Towards the uniform distribution of null P values on affymetrix microarrays. Genome Biol. 2007;8:R69.

    Article  Google Scholar 

  28. Pe’er I, Yelensky R, Altshuler D, Daly MJ. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008;32:381–5.

    Article  Google Scholar 

  29. Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322:881–8.

    Article  CAS  Google Scholar 

  30. Fadista J, Manning AK, Florez JC, Groop L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur J Hum Genet. 2016;24:1202–5.

    Article  Google Scholar 

  31. Pulit SL, de With SA, de Bakker PI. Resetting the bar: statistical significance in whole-genome sequencing-based association studies of global populations. Genet Epidemiol. 2017;41:145–51.

    Article  Google Scholar 

  32. Hendricks AE, Dupuis J, Logue MW, Myers RH, Lunetta KL. Correction for multiple testing in a gene region. Eur J Hum Genet. 2014;22:414–8.

    Article  CAS  Google Scholar 

  33. Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet. 2014;15:335–46.

    Article  CAS  Google Scholar 

  34. Salyakina D, Seaman SR, Browning BL, Dudbridge F, Muller-Myhsok B. Evaluation of Nyholt’s procedure for multiple testing correction. Hum Heredity. 2005;60:19–25. discussion 61–2.

    Article  Google Scholar 

  35. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 1936.

Download references

Acknowledgements

We thank Prof. Dan Nicolae of University of Chicago, and two anonymous referees, for their discussions on issues pertinent to this paper.

Funding

NIH/NIMH grant 5R01MH103368: B-SNIP 2. PI: ESG. NIH/NIMH grant 5R01MH077862: Bipolar-Schizophrenia Consortium for Parsing Intermediate Phenotypes. PI: JAS (BSNIP1). NIH/NIMH grant 5R01MH077851: B-SNIP 2. PI: CAT. NIH/NIMH grant 5R01MH077945: B-SNIP 2. PI: GP. NIH/NIMH grant 5R01MH078113: B-SNIP 2. PI: MSK. NIH/NIMH grant 5R01MH103366: B-SNIP 2. PI: BAC. NIH/NIMH grant 5P50MH094267: Conte Center for Computational Systems Genomics of Neuropsychiatric Phenotypes. PI: A. Rzhetsky.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Huma Asif or Elliot S. Gershon.

Ethics declarations

Conflict of interest

MSK has received a grant from Sunovion and is a consultant to Forum Pharmaceuticals. CAT is a consultant to Intracellular Therapies, an ad hoc consultant to Takeda and Astellas and received a grant from Sunovion. The other authors report no conflicts of interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asif, H., Alliey-Rodriguez, N., Keedy, S. et al. GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size. Mol Psychiatry 26, 2048–2055 (2021). https://doi.org/10.1038/s41380-020-0670-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41380-020-0670-3

This article is cited by

Search

Quick links