z-anonymity: Zero-Delay Anonymization for Data Streams,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

z-anonymity: Zero-Delay Anonymization for Data Streams
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-06-14 , DOI: arxiv-2106.07534
Nikhil Jha, Thomas Favale, Luca Vassio, Martino Trevisan, Marco Mellia

With the advent of big data and the birth of the data markets that sell personal information, individuals' privacy is of utmost importance. The classical response is anonymization, i.e., sanitizing the information that can directly or indirectly allow users' re-identification. The most popular solution in the literature is the k-anonymity. However, it is hard to achieve k-anonymity on a continuous stream of data, as well as when the number of dimensions becomes high.In this paper, we propose a novel anonymization property called z-anonymity. Differently from k-anonymity, it can be achieved with zero-delay on data streams and it is well suited for high dimensional data. The idea at the base of z-anonymity is to release an attribute (an atomic information) about a user only if at least z - 1 other users have presented the same attribute in a past time window. z-anonymity is weaker than k-anonymity since it does not work on the combinations of attributes, but treats them individually. In this paper, we present a probabilistic framework to map the z-anonymity into the k-anonymity property. Our results show that a proper choice of the z-anonymity parameters allows the data curator to likely obtain a k-anonymized dataset, with a precisely measurable probability. We also evaluate a real use case, in which we consider the website visits of a population of users and show that z-anonymity can work in practice for obtaining the k-anonymity too.

中文翻译：

z-anonymity：数据流的零延迟匿名化

随着大数据的出现和出售个人信息的数据市场的诞生，个人隐私至关重要。经典的反应是匿名化，即清理可以直接或间接允许用户重新识别的信息。文献中最流行的解决方案是 k-匿名。然而，在连续数据流上以及维数变高时很难实现k-匿名性。在本文中，我们提出了一种新的匿名化特性，称为z-匿名性。与 k-匿名不同，它可以在数据流上实现零延迟，非常适合高维数据。z-anonymity 的基本思想是，仅当至少有 z-1 个其他用户在过去的时间窗口中呈现相同的属性时，才发布有关用户的属性（原子信息）。z-anonymity 比 k-anonymity 弱，因为它不适用于属性组合，而是单独处理它们。在本文中，我们提出了一个概率框架来将 z-匿名性映射到 k-匿名性属性。我们的结果表明，正确选择 z 匿名参数允许数据管理员可能以精确可测量的概率获得 k 匿名数据集。我们还评估了一个真实用例，其中我们考虑了一群用户的网站访问，并表明 z-匿名性也可以在实践中获得 k-匿名性。z-anonymity 比 k-anonymity 弱，因为它不适用于属性组合，而是单独处理它们。在本文中，我们提出了一个概率框架来将 z-匿名性映射到 k-匿名性属性。我们的结果表明，正确选择 z 匿名参数允许数据管理员可能以精确可测量的概率获得 k 匿名数据集。我们还评估了一个真实的用例，在该用例中，我们考虑了一群用户的网站访问，并表明 z-匿名性也可以在实践中获得 k-匿名性。z-anonymity 比 k-anonymity 弱，因为它不适用于属性组合，而是单独处理它们。在本文中，我们提出了一个概率框架来将 z-匿名性映射到 k-匿名性属性。我们的结果表明，正确选择 z 匿名参数允许数据管理员可能以精确可测量的概率获得 k 匿名数据集。我们还评估了一个真实的用例，在该用例中，我们考虑了一群用户的网站访问，并表明 z-匿名性也可以在实践中获得 k-匿名性。我们的结果表明，正确选择 z 匿名参数允许数据管理员可能以精确可测量的概率获得 k 匿名数据集。我们还评估了一个真实的用例，在该用例中，我们考虑了一群用户的网站访问，并表明 z-匿名性也可以在实践中获得 k-匿名性。我们的结果表明，正确选择 z 匿名参数允许数据管理员可能以精确可测量的概率获得 k 匿名数据集。我们还评估了一个真实的用例，在该用例中，我们考虑了一群用户的网站访问，并表明 z-匿名性也可以在实践中获得 k-匿名性。

更新日期：2021-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文