Semantic Diversity: Privacy Considering Distance Between Values of Sensitive Attribute,Computers & Security

当前位置： X-MOL 学术 › Comput. Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semantic Diversity: Privacy Considering Distance Between Values of Sensitive Attribute
Computers & Security ( IF 5.6 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.cose.2020.101823
Keiichiro Oishi , Yuichi Sei , Yasuyuki Tahara , Akihiko Ohsuga

Abstract A database that contains personal information and is collected by crowdsensing can be used for various purposes. Therefore, database holders may want to share their databases with other organizations. However, since a database contains information about individuals, database recipients must take privacy concerns into consideration. One of the mainstream privacy protection indicators, l-diversity, guarantees that the probability of identifying a sensitive attribute value of an individual in a database is less than 1/l. However, when there are several semantically similar values in the sensitive attribute, there is a possibility that actual diversity is not satisfied, even if anonymization is performed to satisfy l-diversity. For example, an attacker may know that candidates of Alice’s disease are a set of HIV-1(M), HIV-1(N), and HIV-2 if the anonymized database satisfies 3-diversity. In this case, the attacker can conclude that Alice has HIV, although the detailed type remains unknown. In this research, to solve how actual diversity cannot be taken into consideration with existing l-diversity, we proposed a novel privacy indicator, (l, d)-semantic diversity, and an algorithm that anonymizes a database to satisfy (l, d)-semantic diversity. We also proposed an analysis algorithm that is suitable for the proposed anonymizing algorithm because the output of the anonymizing algorithm is difficult to understand. Our proposed algorithms were experimentally evaluated using synthetic and real datasets.

中文翻译：

语义多样性：考虑敏感属性值之间距离的隐私

摘要包含个人信息并通过群体感知收集的数据库可用于各种目的。因此，数据库持有者可能希望与其他组织共享他们的数据库。但是，由于数据库包含有关个人的信息，因此数据库接收者必须考虑隐私问题。主流的隐私保护指标之一，l-diversity，保证在数据库中识别个体敏感属性值的概率小于1/l。但是，当敏感属性中存在多个语义相似的值时，即使进行匿名化以满足l-diversity，也有可能不满足实际的多样性。例如，攻击者可能知道爱丽丝病的候选者是一组 HIV-1(M)、HIV-1(N)、如果匿名数据库满足 3-多样性，则为 HIV-2。在这种情况下，攻击者可以断定 Alice 感染了 HIV，尽管详细类型仍然未知。在这项研究中，为了解决现有的 l-多样性无法考虑实际多样性的问题，我们提出了一种新的隐私指标，(l, d)-语义多样性，以及一种使数据库匿名化以满足 (l, d) 的算法-语义多样性。我们还提出了一种适用于所提出的匿名化算法的分析算法，因为匿名化算法的输出难以理解。我们提出的算法使用合成和真实数据集进行了实验评估。为了解决现有 l-多样性无法考虑实际多样性的问题，我们提出了一种新的隐私指标，(l, d)-语义多样性，以及一种匿名化数据库以满足 (l, d)-语义多样性的算法。我们还提出了一种适用于所提出的匿名化算法的分析算法，因为匿名化算法的输出难以理解。我们提出的算法使用合成和真实数据集进行了实验评估。为了解决现有 l-多样性无法考虑实际多样性的问题，我们提出了一种新的隐私指标，(l, d)-语义多样性，以及一种匿名化数据库以满足 (l, d)-语义多样性的算法。我们还提出了一种适用于所提出的匿名化算法的分析算法，因为匿名化算法的输出难以理解。我们提出的算法使用合成和真实数据集进行了实验评估。我们还提出了一种适用于所提出的匿名化算法的分析算法，因为匿名化算法的输出难以理解。我们提出的算法使用合成和真实数据集进行了实验评估。我们还提出了一种适用于所提出的匿名化算法的分析算法，因为匿名化算法的输出难以理解。我们提出的算法使用合成和真实数据集进行了实验评估。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>