当前位置: X-MOL 学术J. Anim. Ecol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Collision between biological process and statistical analysis revealed by mean‐centering
Journal of Animal Ecology ( IF 3.5 ) Pub Date : 2020-11-12 , DOI: 10.1111/1365-2656.13360
David F. Westneat 1 , Yimen G. Araya‐Ajoy 2 , Hassen Allegue 3 , Barbara Class 4 , Niels Dingemanse 5 , Ned Dochtermann 6 , Laszlo Zsolt Garamszegi 7 , Julien G.A. Martin 8 , Shinichi Nakagawa 9 , Denis Réale 3 , Holger Schielzeth 10, 11
Affiliation  

Animal ecologists often collect hierarchically-structured data and analyze these with linear mixed-effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g., within vs among subjects). Mean-centering of covariates within subjects offers a useful approach in such situations, but is not without problems. A statistical model represents a hypothesis about the underlying biological process. Mean-centering within clusters assumes that the lower level responses (e.g. within subjects) depend on the deviation from the subject mean (relative) rather than on absolute values of the covariate. This may or may not be biologically realistic. We show that mismatch between the nature of the generating (i.e., biological) process and the form of the statistical analysis produce major conceptual and operational challenges for empiricists. We explored the consequences of mismatches by simulating data with three response-generating processes differing in the source of correlation between a covariate and the response. These data were then analyzed by three different analysis equations. We asked how robustly different analysis equations estimate key parameters of interest and under which circumstances biases arise. Mismatches between generating and analytical equations created several intractable problems for estimating key parameters. The most widely misestimated parameter was the among-subject variance in response. We found that no single analysis equation was robust in estimating all parameters generated by all equations. Importantly, even when response-generating and analysis equations matched mathematically, bias in some parameters arose when sampling across the range of the covariate was limited. Our results have general implications for how we collect and analyze data. They also remind us more generally that conclusions from statistical analysis of data are conditional on a hypothesis, sometimes implicit, for the process(es) that generated the attributes we measure. We discuss strategies for real data analysis in face of uncertainty about the underlying biological process.

中文翻译:

平均中心化揭示的生物过程与统计分析之间的冲突

动物生态学家经常收集分层结构的数据,并用线性混合效应模型分析这些数据。当协变量的影响大小在多个级别(例如,受试者内部与受试者之间)变化时,会出现特定的并发症。在这种情况下,对象内协变量的均值中心化提供了一种有用的方法,但并非没有问题。统计模型表示关于潜在生物过程的假设。聚类内的均值中心假设较低级别的响应(例如在受试者内)取决于与受试者均值(相对)的偏差而不是协变量的绝对值。这在生物学上可能是也可能不是现实的。我们展示了生成性质之间的不匹配(即,生物)过程和统计分析的形式给经验主义者带来了重大的概念和操作挑战。我们通过使用三个响应生成过程模拟数据来探索不匹配的后果,这些过程在协变量和响应之间的相关性来源上有所不同。然后通过三个不同的分析方程对这些数据进行分析。我们询问了不同的分析方程如何可靠地估计感兴趣的关键参数,以及在何种情况下会出现偏差。生成方程和分析方程之间的不匹配为估计关键参数带来了几个棘手的问题。最广泛错误估计的参数是受试者之间的反应差异。我们发现,在估计所有方程生成的所有参数时,没有一个分析方程是稳健的。重要的,即使响应生成方程和分析方程在数学上匹配,当跨协变量范围的采样有限时,某些参数也会出现偏差。我们的结果对我们如何收集和分析数据具有​​普遍意义。它们还更普遍地提醒我们,数据统计分析的结论取决于产生我们测量的属性的过程的假设,有时是隐含的。我们讨论了面对潜在生物过程的不确定性的真实数据分析策略。它们还更普遍地提醒我们,数据统计分析的结论取决于产生我们测量的属性的过程的假设,有时是隐含的。我们讨论了面对潜在生物过程的不确定性的真实数据分析策略。它们还更普遍地提醒我们,数据统计分析的结论取决于产生我们测量的属性的过程的假设,有时是隐含的。我们讨论了面对潜在生物过程的不确定性的真实数据分析策略。
更新日期:2020-11-12
down
wechat
bug