当前位置: X-MOL 学术J. R. Stat. Soc. Ser. C Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adding measurement error to location data to protect subject confidentiality while allowing for consistent estimation of exposure effects
The Journal of the Royal Statistical Society: Series C (Applied Statistics) ( IF 1.0 ) Pub Date : 2020-08-15 , DOI: 10.1111/rssc.12439
Mahesh Karra 1 , David Canning 2 , Ryoko Sato 2
Affiliation  

In public use data sets, it is desirable not to report a respondent's location precisely to protect subject confidentiality. However, the direct use of perturbed location data to construct explanatory exposure variables for regression models will generally make naive estimates of all parameters biased and inconsistent. We propose an approach where a perturbation vector, consisting of a random distance at a random angle, is added to a respondent's reported geographic co‐ordinates. We show that, as long as the distribution of the perturbation is public and there is an underlying prior population density map, external researchers can construct unbiased and consistent estimates of location‐dependent exposure effects by using numerical integration techniques over all possible actual locations, although coefficient confidence intervals are wider than if the true location data were known. We examine our method by using a Monte Carlo simulation exercise and apply it to a real world example using data on perceived and actual distance to a health facility in Tanzania.

中文翻译:

在位置数据中添加测量误差以保护主体机密性,同时允许一致地估计曝光效果

在公共使用数据集中,最好不要准确报告受访者的位置以保护主体保密。但是,直接使用扰动的位置数据来构造用于回归模型的解释性暴露变量通常会使所有参数的天真估计存在偏差和不一致。我们提出了一种方法,在该方法中,将包含随机角度的随机距离的摄动向量添加到受访者报告的地理坐标中。我们证明,只要扰动的分布是公开的,并且存在潜在的先验人口密度图,外部研究人员就可以通过在所有可能的实际位置上使用数值积分技术来构建对位置相关的暴露效应的无偏且一致的估计,尽管系数的置信区间比已知真实位置数据的情况宽。我们使用蒙特卡罗模拟演习来检验我们的方法,并使用与坦桑尼亚医疗机构的感知距离和实际距离的数据将其应用于真实示例。
更新日期:2020-10-07
down
wechat
bug