当前位置: X-MOL 学术Stat. Methods Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Yule–Simpson’s paradox: the probabilistic versus the empirical conundrum
Statistical Methods & Applications ( IF 1 ) Pub Date : 2020-07-16 , DOI: 10.1007/s10260-020-00536-4
Aris Spanos

The current literature views Simpson’s paradox as a probabilistic conundrum by taking the premises (probabilities/parameters/ frequencies) as known. In such a context, it is shown that the paradox arises within a very small subset of the relevant parameter space, rendering the paradox unlikely to occur in real data. The problem, however, is that the probabilistic perspective, ignores certain crucial empirical (data, statistical) issues raised by the original Pearson and Yule papers on ‘spurious’ association reversals. Placing the paradox in a broader empirical framework that begins with the raw data \({\mathbf {z}}_{0}\) and an appropriately selected statistical model \({\mathcal {M}}_{{\varvec{{\theta }}}}({\mathbf {x}})\), the discussion elucidates the original Yule–Pearson conundrum by formalizing its notion of ‘spurious or fictitious associations’ into ‘statistically untrustworthy associations’ stemming from a misspecified \({\mathcal {M}}_{{\varvec{{\theta }}}}( {\mathbf {x}})\); invalid probabilistic assumptions imposed on \({\mathbf {z}}_{0}\). It is shown that several empirical examples used to illustrate Simpson’s paradox in the current literature constitute examples of the Yule–Pearson untrustworthy association reversals. The empirical perspective is used to revisit the causal explanation of the paradox and make a case that several widely accepted causal claims are questionable on statistical adequacy grounds. It is also used to propose a procedure to detect and account for the ‘third entity’ in the paradox, as well as (reliably) select among different potential causal explanations, such as collider, mediator or confounder, on empirical grounds.



中文翻译:

尤尔-辛普森悖论:概率与经验之谜

当前文献通过假设前提(概率/参数/频率)来将辛普森悖论视为概率难题。在这种情况下,表明了这种悖论出现在相关参数空间的一个很小的子集中,从而使该悖论不太可能出现在实际数据中。然而,问题在于,概率论的观点忽略了皮尔森和尤尔最初关于“虚假”关联逆转的论文提出的某些关键的经验(数据,统计)问题。放置悖论在与原始数据开始一个更广泛的经验框架\({\ mathbf {Z}} _ {0} \)和适当选择的统计模型\({\ mathcal {M}} _ {{\ varvec { {\ theta}}}}({\ mathbf {x}})\),讨论通过将“虚假或虚构关联”的概念形式化为源自错误指定的\({\ mathcal {M}} _ {{\ varvec {{\ theta}}的“虚假或虚拟关联”的概念,阐明了最初的Yule-Pearson难题。}}}({\ mathbf {x}})\) ; 对\({\ mathbf {z}} _ {0} \)施加的无效概率假设。结果表明,在现有文献中,有几个用于说明辛普森悖论的经验例子构成了尤尔-皮尔逊不可信任关联逆转的例子。经验视角被用来重新解释对这一悖论的因果解释,并提出一个理由,即几个被广泛接受的因果主张因统计上的充分性理由而受到质疑。它还用于提出一种程序来检测和解释悖论中的“第三实体”,并根据经验(可靠地)在不同的潜在因果解释中进行选择,例如对撞机,调解人或混杂因素。

更新日期:2020-07-24
down
wechat
bug