当前位置: X-MOL 学术J. R. Stat. Soc. A › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generating Poisson-distributed differentially private synthetic data
The Journal of the Royal Statistical Society, Series A (Statistics in Society) ( IF 1.5 ) Pub Date : 2021-07-01 , DOI: 10.1111/rssa.12711
Harrison Quick 1
Affiliation  

The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available with a reduced risk of disclosure. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, these mechanisms do not typically resemble the models an end-user might use to analyse the data. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially referenced synthetic data with high utility but without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the differential privacy literatures. In particular, we generalize an approach for generating differentially private synthetic data currently used by the US Census Bureau to the case of Poisson-distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information regarding the underlying event rates. Following a pair of small simulation studies, we illustrate the utility of the synthetic data produced by this approach using publicly available, county-level heart disease-related death counts. This study demonstrates the benefits of the proposed approach’s flexibility with respect to heterogeneity in population sizes and event rates while motivating further research to improve its utility.

中文翻译:

生成泊松分布的差分私有合成数据

合成数据的传播可以成为公开敏感数据信息的有效手段,同时降低披露风险。虽然存在用于合成满足正式隐私保证的数据的机制,但这些机制通常与最终用户可能用于分析的模型不同数据。最近,有人提议使用疾病映射文献中的方法来生成具有高实用性但没有正式隐私保证的空间参考合成数据。本文的目的是帮助弥合疾病映射和差异隐私文献之间的差距。特别是,我们将美国人口普查局目前使用的生成差异私有合成数据的方法推广到泊松分布计数数据的情况,这种方法可以适应人口规模的异质性,并允许注入有关潜在事件的先验信息率。在进行了一对小型模拟研究之后,我们使用公开可用的县级心脏病相关死亡计数来说明这种方法产生的合成数据的效用。
更新日期:2021-07-30
down
wechat
bug