当前位置: X-MOL 学术Evol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics
Evolutionary Biology ( IF 2.5 ) Pub Date : 2019-10-11 , DOI: 10.1007/s11692-019-09484-8
Fred L. Bookstein

Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high p/n,” where p is the count of variables and n the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/n setting. The more obvious pathology is this: when applied to the patternless (null) model of p identically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-p/n settings the bgPCA method very often leads to invalid or insecure biological inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically—it is always untrustworthy, never authoritative—and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.

中文翻译:

几何形态计量学中族间主成分分析的病理学

几何形态计量学(GMM)的良好经验应用通常包含比标本多几倍的变量,统计学家将其称为“高p / n ”,其中p是变量数,n是标本数。本说明提醒您注意在这种高p / n设置下,一种特定的多元统计技术的两个可预测的灾难性故障,即组间主成分分析(bgPCA)。更为明显的病理是:应用于p的无模式(空)模型时在相同大小的组上均等分布的高斯分布,bgPCA及其对组的代数等效,偏最小二乘(PLS)分析,必定会出现虚拟的巨大等边组分隔(从统计模型中消失)。当每个组的样本数量变化很大,或者任何一个组包含的样本少于十个时,技术的失败会更加严重:组越小,bgPCA越可能将该组虚拟地标识为一个组的最终成员。其派生轴。由于这两个原因,当用于GMM和其他高p / n设置bgPCA方法通常会导致无效或不安全的生物学推断。本文演示并量化了无模式模型和具有一个或两个有效因素的模型的这些以及其他病理结果,然后为GMM从业者应如何保护自己免受这些可悲预测的错误陈述的后果提供建议。bgPCA方法永远不能被怀疑地使用-它永远是不可信的,永远都不是权威的-并且只要它在某种程度上支持任何生物学推论,就必须伴随着广泛的诊断图谱和其他挑战,其中许多都是在这里提出的。第一次。
更新日期:2019-10-11
down
wechat
bug