An Entropy Approach to Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains.,Decision Support Systems

当前位置： X-MOL 学术 › Decis. Support Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Entropy Approach to Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains.
Decision Support Systems ( IF 6.7 ) Pub Date : 2010-11-24 , DOI: 10.1016/j.dss.2010.11.014
Edoardo M Airoldi ₁ , Xue Bai , Bradley A Malin

Affiliation

We live in an increasingly mobile world, which leads to the duplication of information across domains. Though organizations attempt to obscure the identities of their constituents when sharing information for worthwhile purposes, such as basic research, the uncoordinated nature of such environment can lead to privacy vulnerabilities. For instance, disparate healthcare providers can collect information on the same patient. Federal policy requires that such providers share “de-identified” sensitive data, such as biomedical (e.g., clinical and genomic) records. But at the same time, such providers can share identified information, devoid of sensitive biomedical data, for administrative functions. On a provider-by-provider basis, the biomedical and identified records appear unrelated, however, links can be established when multiple providers' databases are studied jointly. The problem, known as trail disclosure, is a generalized phenomenon and occurs because an individual's location access pattern can be matched across the shared databases. Due to technical and legal constraints, it is often difficult to coordinate between providers and thus it is critical to assess the disclosure risk in distributed environments, so that we can develop techniques to mitigate such risks. Research on privacy protection has so far focused on developing technologies to suppress or encrypt identifiers associated with sensitive information. There is a growing body of work on the formal assessment of the disclosure risk of database entries in publicly shared databases, but less attention has been paid to the distributed setting. In this research, we review the trail disclosure problem in several domains with known vulnerabilities and show that disclosure risk is influenced by the distribution of how people visit service providers. Based on empirical evidence, we propose an entropy metric for assessing such risk in shared databases prior to their release. This metric assesses risk by leveraging the statistical characteristics of a visit distribution, as opposed to person-level data. It is computationally efficient and superior to existing risk assessment methods, which rely on ad hoc assessment that are often computationally expensive and unreliable. We evaluate our approach on a range of location access patterns in simulated environments. Our results demonstrate that the approach is effective at estimating trail disclosure risks and the amount of self-information contained in a distributed system is one of the main driving factors.

中文翻译：

披露风险评估的熵方法：实际应用和模拟领域的经验教训。

我们生活在一个日益移动的世界，这导致跨领域的信息重复。尽管组织在出于有价值的目的（例如基础研究）共享信息时试图掩盖其成员的身份，但这种环境的不协调性可能会导致隐私漏洞。例如，不同的医疗保健提供者可以收集同一患者的信息。联邦政策要求此类提供商共享“去识别化”的敏感数据，例如生物医学（例如临床和基因组）记录。但与此同时，此类提供商可以共享识别信息，以实现管理职能，而无需敏感的生物医学数据。在逐个提供者的基础上，生物医学和识别记录似乎无关，但是，当联合研究多个提供者的数据库时可以建立链接。这个问题被称为踪迹泄露，是一种普遍现象，发生的原因是个人的位置访问模式可以在共享数据库中进行匹配。由于技术和法律的限制，提供商之间通常很难协调，因此评估分布式环境中的披露风险至关重要，以便我们可以开发技术来减轻此类风险。迄今为止，隐私保护的研究主要集中在开发抑制或加密与敏感信息相关的标识符的技术。对公共共享数据库中数据库条目的披露风险进行正式评估的工作越来越多，但对分布式环境的关注较少。在这项研究中，我们回顾了几个具有已知漏洞的领域中的踪迹泄露问题，并表明泄露风险受到人们访问服务提供商的方式分布的影响。基于经验证据，我们提出了一种熵度量，用于在共享数据库发布之前评估此类风险。该指标通过利用访问分布的统计特征（而不是人员级别的数据）来评估风险。它的计算效率很高，并且优于现有的风险评估方法，现有的风险评估方法依赖于计算成本高昂且不可靠的临时评估。我们在模拟环境中评估一系列位置访问模式的方法。我们的结果表明，该方法可以有效地估计线索披露风险，并且分布式系统中包含的自信息量是主要驱动因素之一。

更新日期：2010-11-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11