当前位置: X-MOL 学术Sociological Methodology › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimating Income Statistics from Grouped Data: Mean-constrained Integration over Brackets
Sociological Methodology ( IF 2.4 ) Pub Date : 2018-07-09 , DOI: 10.1177/0081175018782579
Paul A. Jargowsky 1 , Christopher A. Wheeler 2
Affiliation  

Researchers studying income inequality, economic segregation, and other subjects must often rely on grouped data—that is, data in which thousands or millions of observations have been reduced to counts of units by specified income brackets. The distribution of households within the brackets is unknown, and highest incomes are often included in an open-ended top bracket, such as “$200,000 and above.” Common approaches to this estimation problem include calculating midpoint estimators with an assumed Pareto distribution in the top bracket and fitting a flexible multiple-parameter distribution to the data. The authors describe a new method, mean-constrained integration over brackets (MCIB), that is far more accurate than those methods using only the bracket counts and the overall mean of the data. On the basis of an analysis of 297 metropolitan areas, MCIB produces estimates of the standard deviation, Gini coefficient, and Theil index that are correlated at 0.997, 0.998, and 0.991, respectively, with the parameters calculated from the underlying individual record data. Similar levels of accuracy are obtained for percentiles of the distribution and the shares of income by quintiles of the distribution. The technique can easily be extended to other distributional parameters and inequality statistics.

中文翻译:

从分组数据估计收入统计数据:括号上的均值约束积分

研究收入不平等、经济隔离和其他学科的研究人员通常必须依赖分组数据——即,数千或数百万个观察值已被减少到指定收入等级的单位计数的数据。括号内的家庭分布是未知的,最高收入通常包含在一个开放式的最高括号中,例如“200,000 美元及以上”。解决此估计问题的常用方法包括使用顶部括号中的假定帕累托分布计算中点估计量,并将灵活的多参数分布拟合到数据中。作者描述了一种新方法,即括号上的均值约束积分 (MCIB),该方法比仅使用括号计数和数据总体平均值的方法准确得多。根据对 297 个大都市区的分析,MCIB 生成标准差、基尼系数和泰尔指数的估计值,这些估计值分别与从基础个人记录数据计算的参数相关联的 0.997、0.998 和 0.991。对于分布的百分位数和分布的五分位数的收入份额,可以获得类似的准确度水平。该技术可以很容易地扩展到其他分布参数和不等式统计。
更新日期:2018-07-09
down
wechat
bug