当前位置: X-MOL 学术Behav. Res. Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Violating the normality assumption may be the lesser of two evils
Behavior Research Methods ( IF 5.953 ) Pub Date : 2021-05-07 , DOI: 10.3758/s13428-021-01587-5
Ulrich Knief 1 , Wolfgang Forstmeier 2
Affiliation  

When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.



中文翻译:

违反正态性假设可能是两害相权取其轻

当数据不是正态分布时,研究人员通常不确定使用假设高斯错误的测试是否合法,或者是否必须对更具体的错误结构进行建模或使用随机化技术。在这里,我们使用蒙特卡罗模拟来探索将高斯模型拟合到非正态数据在 I 类错误的风险、参数估计的功率和效用方面的优缺点。我们发现高斯模型在广泛的条件下对非正态性具有鲁棒性,这意味着p除了在严格的 alpha 水平上判断有影响的异常值的数据外,这些值仍然相当可靠。高斯模型在所有模拟场景的功率方面也表现良好。参数估计大多是无偏和精确的,除非样本量小或预测变量的分布高度偏斜。在分析之前转换数据通常是可取的,并且对异常值和异方差的视觉检查对于评估很重要。与此形成鲜明对比的是,一些非高斯模型和随机化技术承担着一系列通常不为人知的风险。例如,当计数数据的过度分散没有得到适当控制或随机化程序忽略数据中现有的非独立性时,可能会出现高误报率。因此,新开发的统计方法不仅带来了新的机会,而且还可能对可靠性构成新的威胁。我们认为,违反正态假设的风险是有限且可控的,而一些更复杂的方法相对容易出错,并且在同行评审期间特别难以检查。没有完全意识到风险的科学家和审稿人可能会受益于优先信任高斯混合模型,其中随机效应解释了数据中的非独立性。

更新日期:2021-05-08
down
wechat
bug