当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
Genome Research ( IF 6.2 ) Pub Date : 2023-07-01 , DOI: 10.1101/gr.277674.123
Rajagopal Venkatesaramani 1 , Zhiyu Wan 2 , Bradley A Malin 2 , Yevgeniy Vorobeychik 3
Affiliation  

The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio–based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy–utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets.

中文翻译:


在基因组数据信标和汇总统计中实现隐私和实用性的权衡



基因组数据的收集和共享在研究、临床和直接面向消费者的环境中变得越来越普遍。通常用于保护个人隐私的计算协议包括共享摘要统计数据,例如等位基因频率,或使用称为信标的网络服务限制对感兴趣的等位基因是否存在的查询响应。然而,即使如此有限的发布也容易受到基于似然比的成员资格推断攻击。已经提出了几种保护隐私的方法,这些方法要么抑制基因组变体的子集,要么修改特定变体的查询响应(例如,添加噪声,如差分隐私中那样)。然而,许多这些方法会导致显着的效用损失,要么抑制许多变体,要么增加大量噪声。在本文中,我们引入基于优化的方法,根据似然比,结合变体抑制和修改,明确权衡摘要数据或信标响应的效用和关于成员推理攻击的隐私。我们考虑两种攻击模型。首先,攻击者应用似然比测试来做出成员资格推断声明。在第二个模型中,攻击者使用一个阈值来解释数据发布对数据集中个体和非数据集中个体之间分数差异的影响。我们进一步引入高度可扩展的方法,用于近似解决当信息采用汇总统计或存在/不存在查询形式时的隐私-实用性权衡问题。最后,我们通过对公共数据集的广泛评估表明,所提出的方法在实用性和隐私性方面均优于现有技术。
更新日期:2023-07-01
down
wechat
bug