Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data,Journal of Official Statistics

当前位置： X-MOL 学术 › Journal of Official Statistics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data
Journal of Official Statistics ( IF 0.5 ) Pub Date : 2019-09-01 , DOI: 10.2478/jos-2019-0025
Eva Endres ₁ , Paul Fink ₁ , Thomas Augustin ₁

Affiliation

Abstract Statistical matching is the term for the integration of two or more data files that share a partially overlapping set of variables. Its aim is to obtain joint information on variables collected in different surveys based on different observation units. This naturally leads to an identification problem, since there is no observation that contains information on all variables of interest. We develop the first statistical matching micro approach reflecting the natural uncertainty of statistical matching arising from the identification problem in the context of categorical data. A complete synthetic file is obtained by imprecise imputation, replacing missing entries by sets of suitable values. Altogether, we discuss three imprecise imputation strategies and propose ideas for potential refinements. Additionally, we show how the results of imprecise imputation can be embedded into the theory of finite random sets, providing tight lower and upper bounds for probability statements. The results based on a newly developed simulation design–which is customised to the specific requirements for assessing the quality of a statistical matching procedure for categorical data–corroborate that the narrowness of these bounds is practically relevant and that these bounds almost always cover the true parameters.

中文翻译：

不精确的归因：一种非参数的微观方法，反映了统计数据与分类数据匹配的自然不确定性

摘要统计匹配是用于集成两个或多个共享部分变量重叠的数据文件的术语。其目的是获得有关基于不同观测单位在不同调查中收集的变量的联合信息。这自然会导致识别问题，因为没有观察到包含所有感兴趣变量的信息。我们开发了第一个统计匹配微观方法，该方法反映了在分类数据的上下文中由识别问题引起的统计匹配的自然不确定性。通过不精确的插补获得完整的合成文件，用适当的值集替换丢失的条目。我们总共讨论了三种不精确的插补策略，并提出了可能进行改进的想法。另外，我们展示了不精确插补的结果如何可以嵌入到有限随机集理论中，为概率陈述提供了严格的上下限。该结果基于新开发的仿真设计（针对评估分类数据的统计匹配程序的质量的特定要求进行了定制），证实了这些界限的狭窄程度实际上是相关的，并且这些界限几乎始终覆盖了真实参数。

更新日期：2019-09-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文