Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Techniques and Methods

Murine genetic models of obesity: type I error rates and the power of commonly used analyses as assessed by plasmode-based simulation

Abstract

Background/Objectives

Genetic contributors to obesity are frequently studied in murine models. However, the sample sizes of these studies are often small, and the data may violate assumptions of common statistical tests, such as normality of distributions. We examined whether, in these cases, type I error rates and power are affected by the choice of statistical test.

Subjects/Methods

We conducted “plasmode”-based simulation using empirical data on body mass (weight) from murine genetic models of obesity. For the type I error simulation, the weight distributions were adjusted to ensure no difference in means between control and mutant groups. For the power simulation, the distributions of the mutant groups were shifted to ensure specific effect sizes. Three to twenty mice were resampled from the empirical distributions to create a plasmode. We then computed type I error rates and power for five common tests on the plasmodes: Student’s t test, Welch’s t test, Wilcoxon rank sum test (aka, Mann–Whitney U test), permutation test, and bootstrap test.

Results

We observed type I error inflation for all tests, except the bootstrap test, with small samples (≤5). Type I error inflation decreased as sample size increased (≥8) but remained. The Wilcoxon test should be avoided because of heterogeneity of distributions. For power, a departure from the reference was observed with small samples for all tests. Compared with the other tests, the bootstrap test had less power with small samples.

Conclusions

Overall, the bootstrap test is recommended for small samples to avoid type I error inflation, but this benefit comes at the cost of lower power. When sample size is large enough, Welch’s t test is recommended because of high power with minimal type I error inflation.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Summary of the simulation protocol.
Fig. 2: Summary of baseline body mass (weight).
Fig. 3: Estimated type I error rate from the plasmode-based simulation (significance level = 0.05).
Fig. 4: Summary of type I error rate for each sample size (significance level = 0.05).
Fig. 5: Estimated power with effect size set to Cohen’s d of 1.5 for each test.

Similar content being viewed by others

References

  1. National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs). https://www.nc3rs.org.uk/. Accessed 12 Feb 2019.

  2. Allison DB, Brown AW, George BJ, Kaiser KA. Reproducibility: a tragedy of errors. Nature. 2016;530:27–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Brown AW, Kaiser KA, Allison DB. Issues with data and analyses: errors, underlying themes, and potential solutions. Proc Natl Acad Sci. 2018;115:2563–70.

    CAS  PubMed  Google Scholar 

  4. National Academies of Sciences, Engineering, and Medicine. Reproducibility issues in research with animals and animal models: workshop in brief. Washington, DC: The National Academies Press; 2015. p. 8.

  5. Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116:116–26.

    CAS  PubMed  Google Scholar 

  6. Begley CG, Ellis LM. Raise standards for preclinical cancer research. Nature. 2012;483:531.

    CAS  PubMed  Google Scholar 

  7. Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 2010;8:e1000412.

    PubMed  PubMed Central  Google Scholar 

  8. ARRIVE guidelines. https://www.nc3rs.org.uk/arrive-guidelines. Accessed 12 Feb 2019.

  9. Smith AJ, Clutton RE, Lilley E, Hansen KEA, Brattelid T. PREPARE: guidelines for planning animal research and testing. Lab Anim. 2018;52:135–41.

    CAS  PubMed  Google Scholar 

  10. Student. The probable error of a mean. Biometrika. 1908;6:1–25.

    Google Scholar 

  11. Welch BL. The generalisation of student’s problems when several different population variances are involved. Biometrika. 1947;34:28–35.

    CAS  PubMed  Google Scholar 

  12. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18:50–60.

    Google Scholar 

  13. Wilcoxon F. Individual comparisons by ranking methods. Biom Bull. 1945;1:80–3.

    Google Scholar 

  14. Pitman EJG. Significance tests which may be applied to samples from any populations. J R Stat Soc. 1937;4:119–30.

    Google Scholar 

  15. Hall P, Wilson SR. Two guidelines for bootstrap hypothesis testing. Biometrics. 1991;47:757–62.

    Google Scholar 

  16. GEP Box, Andersen SL. Permutation theory in the derivation of robust criteria and the study of departures from assumption. J R Stat Soc Ser B. 1955;17:1–34.

    Google Scholar 

  17. Hayes AF. Permutation test is not distribution-free: Testing H0: ρ = 0. Psychol Methods. 1996;1:184–98.

    Google Scholar 

  18. Gibbons JD, Chakraborti S. Comparisons of the Mann-Whitney, Student’s t, and Alternate t tests for means of normal distributions. The J Exp Educ. 1991;59:258–67.

    Google Scholar 

  19. Zimmerman DW, Zumbo BD. Parametric alternatives to the Student t test under violation of normality and homogeneity of variance. Percept Motor Skills. 1992;74:835–44.

    Google Scholar 

  20. Zimmerman DW. Statistical significance levels of nonparametric tests biased by heterogeneous variances of treatment groups. J Gen Psychol. 2000;127:354–64.

    CAS  PubMed  Google Scholar 

  21. Rogan JC, Keselman HJ. Is the ANOVA F-test robust to variance heterogeneity when sample sizes are equal? An investigation via a coefficient of variation. Am Educ Res J. 1977;14:493–8.

    Google Scholar 

  22. Box GEP. Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Ann Math Stat. 1954;25:290–302.

    Google Scholar 

  23. Cattell RB, Jaspers J. A general plasmode (No. 30-10-5-2) for factor analytic exercises and research. Multivar Behav Res Monogr. 1967;67-3:211.

    Google Scholar 

  24. Mehta T, Tanik M, Allison DB. Towards sound epistemological foundations of statistical methods for high-dimensional biology. Nat Genet. 2004;36:943.

    CAS  PubMed  Google Scholar 

  25. Gadbury GL, Xiang Q, Yang L, Barnes S, Page GP, Allison DB. Evaluating statistical methods using plasmode data sets in the age of massive public databases: an illustration using false discovery rates. PLoS Genet. 2008;4:e1000098.

    PubMed  PubMed Central  Google Scholar 

  26. Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5:155–76.

    PubMed  Google Scholar 

  27. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31:e15–e.

    PubMed  PubMed Central  Google Scholar 

  28. Bouchard G, Johnson D, Carver T, Paigen B, Carey MC. Cholesterol gallstone formation in overweight mice establishes that obesity per se is not linked directly to cholelithiasis risk. J Lipid Res. 2002;43:1105–13.

    CAS  PubMed  Google Scholar 

  29. The Jackson Laboratory. Mouse Phenotype Database. The Jackson Laboratory; 2018. https://phenome.jax.org/projects/Paigen3. Accecced 31 May 2018.

  30. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6–10.

    PubMed  Google Scholar 

  31. Goodman SN. How sure are you of your result? Put a number on it. Nature. 2018;564:7.

    CAS  PubMed  Google Scholar 

  32. Sawilowsky SS. New effect size rules of thumb. J Mod Appl Stat Methods. 2009;8:26.

    Google Scholar 

  33. Zimmerman DW. Comparative power of Student t test and Mann-Whitney U test for unequal sample sizes and variances. J Exp Educ. 1987;55:171–4.

    Google Scholar 

  34. Zimmerman DW. A note on homogeneity of variance of scores and ranks. J Exp Educ. 1996;64:351–62.

    Google Scholar 

  35. Zimmerman DW. Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions. J Exp Educ. 1998;67:55–68.

    Google Scholar 

Download references

Acknowledgements

This study was supported in part by NIH grants 3P30DK056336 (DBA), R25DK099080 (DBA), R25HL124208 (DBA) and Japan Society for Promotion of Science (JSPS) KAKENHI grant 18K18146 (KE). The data analyses and simulation were performed using a supercomputer, Karst, which was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute, and in part by the Indiana METACyt Initiative. The Indiana METACyt Initiative at IU was also supported in part by Lilly Endowment, Inc. The opinions expressed are those of the authors and do not necessarily represent those of the NIH or any other organization. All the code which was used in this study will be available through the following webpage: https://doi.org/10.5281/zenodo.1488359. Supplementary information is available at the International Journal of Obesity’s website.

Author information

Authors and Affiliations

Authors

Contributions

DBA designed the research. DLSJ and AWB gathered the data. KE and AWB performed statistical analysis. DBA, DLSJ, and UB assisted in data analysis. All authors were involved in writing or editing the paper and had final approval of the submitted and published versions

Corresponding authors

Correspondence to Keisuke Ejima or David B. Allison.

Ethics declarations

Conflict of interest

UB has no conflicts of interest. In the last 12 months, DBA has received personal payments or promises for same from for-profit organizations including: Biofortis; Gelesis; Fish & Richardson, P.C.; IKEA; Law Offices of Ronald Marron; Sage Publishing; Tomasik, Kotin & Kasserman LLC; Medpace; Nestle; WW (formerly Weight Watchers International, LLC) and was an unpaid member of the International Life Sciences Institute North America Board of Trustees. In the last 12 months, AWB has received personal payments or paid travel from: American Society for Nutrition, Indiana University, Kentuckiana Health Collaborative, Rippe Lifestyle Institute, Inc. Indiana University has received grants from the following entities to support some of the authors’ research or educational activities: NIH; Alliance for Potato Research and Education; American Federation for Aging Research; Dairy Management Inc; Herbalife; Laura and John Arnold Foundation; Oxford University Press; Sloan Foundation; University of Alabama at Birmingham. In the last 12 months, DLSJ has received personal payments or paid travel from: University of Alabama at Birmingham. University of Alabama at Birmingham has received grants from the following entities to support some of the authors’ research or educational activities: NIH; Alliance for Potato Research and Education. In the last 12 months, KE has received personal payments or paid travel from: The University of Tokyo. The University of Tokyo has received grants from the following entities to support some of the authors’ research or educational activities: Japan Society for the Promotion of Science.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ejima, K., Brown, A.W., Smith, D.L. et al. Murine genetic models of obesity: type I error rates and the power of commonly used analyses as assessed by plasmode-based simulation. Int J Obes 44, 1440–1449 (2020). https://doi.org/10.1038/s41366-020-0554-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41366-020-0554-2

Search

Quick links