当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cumulative deviation of a subpopulation from the full population
Journal of Big Data ( IF 8.1 ) Pub Date : 2021-09-06 , DOI: 10.1186/s40537-021-00494-y
Mark Tygert 1
Affiliation  

Assessing equity in treatment of a subpopulation often involves assigning numerical “scores” to all individuals in the full population such that similar individuals get similar scores; matching via propensity scores or appropriate covariates is common, for example. Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals’ memberships in the subpopulation. The traditional graphical methods for visualizing inequities are known as “reliability diagrams” or “calibrations plots,” which bin the scores into a partition of all possible values, and for each bin plot both the average outcomes for only individuals in the subpopulation as well as the average outcomes for all individuals; comparing the graph for the subpopulation with that for the full population gives some sense of how the averages for the subpopulation deviate from the averages for the full population. Unfortunately, real data sets contain only finitely many observations, limiting the usable resolution of the bins, and so the conventional methods can obscure important variations due to the binning. Fortunately, plotting cumulative deviation of the subpopulation from the full population as proposed in this paper sidesteps the problematic coarse binning. The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs. Slope is easy to perceive even when the constant offsets of the secant lines are irrelevant. The cumulative approach avoids binning that smooths over deviations of the subpopulation from the full population. Such cumulative aggregation furnishes both high-resolution graphical methods and simple scalar summary statistics (analogous to those of Kuiper and of Kolmogorov and Smirnov used in statistical significance testing for comparing probability distributions).



中文翻译:

子总体与完整总体的累积偏差

评估亚群治疗的公平性通常涉及为整个群体中的所有个体分配数字“分数”,以便相似的个体获得相似的分数;例如,通过倾向得分或适当的协变量进行匹配是很常见的。给定这样的分数,具有相似分数的个人可能会或可能不会获得类似的结果,而与个人在亚群中的成员资格无关。用于可视化不公平的传统图形方法被称为“可靠性图”或“校准图”,它将分数划分为所有可能值的分区,并且对于每个分箱图,只有子总体中的个体的平均结果以及所有个体的平均结果;将子总体的图形与完整总体的图形进行比较,可以了解子总体的平均值如何偏离完整总体的平均值。不幸的是,真实数据集仅包含有限数量的观测值,限制了 bin 的可用分辨率,因此传统方法可能会掩盖由于 binning 引起的重要变化。幸运的是,如本文所提出的那样,绘制子总体与完整总体的累积偏差可以避开有问题的粗分箱。累积图将亚群偏差直接编码为图形的割线斜率。即使割线的恒定偏移量无关紧要,斜率也很容易感知。累积方法避免了平滑子总体与完整总体偏差的分箱。

更新日期:2021-09-07
down
wechat
bug