当前位置:
X-MOL 学术
›
arXiv.cs.CR
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Differentially Private Simple Linear Regression
arXiv - CS - Cryptography and Security Pub Date : 2020-07-10 , DOI: arxiv-2007.05157 Daniel Alabi, Audra McMillan, Jayshree Sarathy, Adam Smith and Salil Vadhan
arXiv - CS - Cryptography and Security Pub Date : 2020-07-10 , DOI: arxiv-2007.05157 Daniel Alabi, Audra McMillan, Jayshree Sarathy, Adam Smith and Salil Vadhan
Economics and social science research often require analyzing datasets of
sensitive personal information at fine granularity, with models fit to small
subsets of the data. Unfortunately, such fine-grained analysis can easily
reveal sensitive individual information. We study algorithms for simple linear
regression that satisfy differential privacy, a constraint which guarantees
that an algorithm's output reveals little about any individual input data
record, even to an attacker with arbitrary side information about the dataset.
We consider the design of differentially private algorithms for simple linear
regression for small datasets, with tens to hundreds of datapoints, which is a
particularly challenging regime for differential privacy. Focusing on a
particular application to small-area analysis in economics research, we study
the performance of a spectrum of algorithms we adapt to the setting. We
identify key factors that affect their performance, showing through a range of
experiments that algorithms based on robust estimators (in particular, the
Theil-Sen estimator) perform well on the smallest datasets, but that other more
standard algorithms do better as the dataset size increases.
中文翻译:
差分私有简单线性回归
经济学和社会科学研究通常需要以细粒度分析敏感个人信息的数据集,模型适合数据的小子集。不幸的是,这种细粒度的分析很容易揭示敏感的个人信息。我们研究了满足差分隐私的简单线性回归算法,这是一种约束,可保证算法的输出几乎不会透露任何单个输入数据记录,即使对具有数据集的任意边信息的攻击者也是如此。我们考虑为具有数十到数百个数据点的小数据集的简单线性回归设计差分隐私算法,这对于差分隐私来说是一个特别具有挑战性的制度。专注于经济学研究中小范围分析的特定应用,我们研究了我们适应环境的一系列算法的性能。我们确定了影响其性能的关键因素,通过一系列实验表明,基于稳健估计器(特别是 Theil-Sen 估计器)的算法在最小数据集上表现良好,但其他更标准的算法在数据集大小时表现更好增加。
更新日期:2020-07-13
中文翻译:
差分私有简单线性回归
经济学和社会科学研究通常需要以细粒度分析敏感个人信息的数据集,模型适合数据的小子集。不幸的是,这种细粒度的分析很容易揭示敏感的个人信息。我们研究了满足差分隐私的简单线性回归算法,这是一种约束,可保证算法的输出几乎不会透露任何单个输入数据记录,即使对具有数据集的任意边信息的攻击者也是如此。我们考虑为具有数十到数百个数据点的小数据集的简单线性回归设计差分隐私算法,这对于差分隐私来说是一个特别具有挑战性的制度。专注于经济学研究中小范围分析的特定应用,我们研究了我们适应环境的一系列算法的性能。我们确定了影响其性能的关键因素,通过一系列实验表明,基于稳健估计器(特别是 Theil-Sen 估计器)的算法在最小数据集上表现良好,但其他更标准的算法在数据集大小时表现更好增加。