Abstract
Background/Objectives
Genetic contributors to obesity are frequently studied in murine models. However, the sample sizes of these studies are often small, and the data may violate assumptions of common statistical tests, such as normality of distributions. We examined whether, in these cases, type I error rates and power are affected by the choice of statistical test.
Subjects/Methods
We conducted “plasmode”-based simulation using empirical data on body mass (weight) from murine genetic models of obesity. For the type I error simulation, the weight distributions were adjusted to ensure no difference in means between control and mutant groups. For the power simulation, the distributions of the mutant groups were shifted to ensure specific effect sizes. Three to twenty mice were resampled from the empirical distributions to create a plasmode. We then computed type I error rates and power for five common tests on the plasmodes: Student’s t test, Welch’s t test, Wilcoxon rank sum test (aka, Mann–Whitney U test), permutation test, and bootstrap test.
Results
We observed type I error inflation for all tests, except the bootstrap test, with small samples (≤5). Type I error inflation decreased as sample size increased (≥8) but remained. The Wilcoxon test should be avoided because of heterogeneity of distributions. For power, a departure from the reference was observed with small samples for all tests. Compared with the other tests, the bootstrap test had less power with small samples.
Conclusions
Overall, the bootstrap test is recommended for small samples to avoid type I error inflation, but this benefit comes at the cost of lower power. When sample size is large enough, Welch’s t test is recommended because of high power with minimal type I error inflation.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs). https://www.nc3rs.org.uk/. Accessed 12 Feb 2019.
Allison DB, Brown AW, George BJ, Kaiser KA. Reproducibility: a tragedy of errors. Nature. 2016;530:27–9.
Brown AW, Kaiser KA, Allison DB. Issues with data and analyses: errors, underlying themes, and potential solutions. Proc Natl Acad Sci. 2018;115:2563–70.
National Academies of Sciences, Engineering, and Medicine. Reproducibility issues in research with animals and animal models: workshop in brief. Washington, DC: The National Academies Press; 2015. p. 8.
Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116:116–26.
Begley CG, Ellis LM. Raise standards for preclinical cancer research. Nature. 2012;483:531.
Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 2010;8:e1000412.
ARRIVE guidelines. https://www.nc3rs.org.uk/arrive-guidelines. Accessed 12 Feb 2019.
Smith AJ, Clutton RE, Lilley E, Hansen KEA, Brattelid T. PREPARE: guidelines for planning animal research and testing. Lab Anim. 2018;52:135–41.
Student. The probable error of a mean. Biometrika. 1908;6:1–25.
Welch BL. The generalisation of student’s problems when several different population variances are involved. Biometrika. 1947;34:28–35.
Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18:50–60.
Wilcoxon F. Individual comparisons by ranking methods. Biom Bull. 1945;1:80–3.
Pitman EJG. Significance tests which may be applied to samples from any populations. J R Stat Soc. 1937;4:119–30.
Hall P, Wilson SR. Two guidelines for bootstrap hypothesis testing. Biometrics. 1991;47:757–62.
GEP Box, Andersen SL. Permutation theory in the derivation of robust criteria and the study of departures from assumption. J R Stat Soc Ser B. 1955;17:1–34.
Hayes AF. Permutation test is not distribution-free: Testing H0: ρ = 0. Psychol Methods. 1996;1:184–98.
Gibbons JD, Chakraborti S. Comparisons of the Mann-Whitney, Student’s t, and Alternate t tests for means of normal distributions. The J Exp Educ. 1991;59:258–67.
Zimmerman DW, Zumbo BD. Parametric alternatives to the Student t test under violation of normality and homogeneity of variance. Percept Motor Skills. 1992;74:835–44.
Zimmerman DW. Statistical significance levels of nonparametric tests biased by heterogeneous variances of treatment groups. J Gen Psychol. 2000;127:354–64.
Rogan JC, Keselman HJ. Is the ANOVA F-test robust to variance heterogeneity when sample sizes are equal? An investigation via a coefficient of variation. Am Educ Res J. 1977;14:493–8.
Box GEP. Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Ann Math Stat. 1954;25:290–302.
Cattell RB, Jaspers J. A general plasmode (No. 30-10-5-2) for factor analytic exercises and research. Multivar Behav Res Monogr. 1967;67-3:211.
Mehta T, Tanik M, Allison DB. Towards sound epistemological foundations of statistical methods for high-dimensional biology. Nat Genet. 2004;36:943.
Gadbury GL, Xiang Q, Yang L, Barnes S, Page GP, Allison DB. Evaluating statistical methods using plasmode data sets in the age of massive public databases: an illustration using false discovery rates. PLoS Genet. 2008;4:e1000098.
Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5:155–76.
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31:e15–e.
Bouchard G, Johnson D, Carver T, Paigen B, Carey MC. Cholesterol gallstone formation in overweight mice establishes that obesity per se is not linked directly to cholelithiasis risk. J Lipid Res. 2002;43:1105–13.
The Jackson Laboratory. Mouse Phenotype Database. The Jackson Laboratory; 2018. https://phenome.jax.org/projects/Paigen3. Accecced 31 May 2018.
Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2:6–10.
Goodman SN. How sure are you of your result? Put a number on it. Nature. 2018;564:7.
Sawilowsky SS. New effect size rules of thumb. J Mod Appl Stat Methods. 2009;8:26.
Zimmerman DW. Comparative power of Student t test and Mann-Whitney U test for unequal sample sizes and variances. J Exp Educ. 1987;55:171–4.
Zimmerman DW. A note on homogeneity of variance of scores and ranks. J Exp Educ. 1996;64:351–62.
Zimmerman DW. Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions. J Exp Educ. 1998;67:55–68.
Acknowledgements
This study was supported in part by NIH grants 3P30DK056336 (DBA), R25DK099080 (DBA), R25HL124208 (DBA) and Japan Society for Promotion of Science (JSPS) KAKENHI grant 18K18146 (KE). The data analyses and simulation were performed using a supercomputer, Karst, which was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute, and in part by the Indiana METACyt Initiative. The Indiana METACyt Initiative at IU was also supported in part by Lilly Endowment, Inc. The opinions expressed are those of the authors and do not necessarily represent those of the NIH or any other organization. All the code which was used in this study will be available through the following webpage: https://doi.org/10.5281/zenodo.1488359. Supplementary information is available at the International Journal of Obesity’s website.
Author information
Authors and Affiliations
Contributions
DBA designed the research. DLSJ and AWB gathered the data. KE and AWB performed statistical analysis. DBA, DLSJ, and UB assisted in data analysis. All authors were involved in writing or editing the paper and had final approval of the submitted and published versions
Corresponding authors
Ethics declarations
Conflict of interest
UB has no conflicts of interest. In the last 12 months, DBA has received personal payments or promises for same from for-profit organizations including: Biofortis; Gelesis; Fish & Richardson, P.C.; IKEA; Law Offices of Ronald Marron; Sage Publishing; Tomasik, Kotin & Kasserman LLC; Medpace; Nestle; WW (formerly Weight Watchers International, LLC) and was an unpaid member of the International Life Sciences Institute North America Board of Trustees. In the last 12 months, AWB has received personal payments or paid travel from: American Society for Nutrition, Indiana University, Kentuckiana Health Collaborative, Rippe Lifestyle Institute, Inc. Indiana University has received grants from the following entities to support some of the authors’ research or educational activities: NIH; Alliance for Potato Research and Education; American Federation for Aging Research; Dairy Management Inc; Herbalife; Laura and John Arnold Foundation; Oxford University Press; Sloan Foundation; University of Alabama at Birmingham. In the last 12 months, DLSJ has received personal payments or paid travel from: University of Alabama at Birmingham. University of Alabama at Birmingham has received grants from the following entities to support some of the authors’ research or educational activities: NIH; Alliance for Potato Research and Education. In the last 12 months, KE has received personal payments or paid travel from: The University of Tokyo. The University of Tokyo has received grants from the following entities to support some of the authors’ research or educational activities: Japan Society for the Promotion of Science.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Ejima, K., Brown, A.W., Smith, D.L. et al. Murine genetic models of obesity: type I error rates and the power of commonly used analyses as assessed by plasmode-based simulation. Int J Obes 44, 1440–1449 (2020). https://doi.org/10.1038/s41366-020-0554-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41366-020-0554-2