Minimizing information loss in shared data: Hiding frequent patterns with multiple sensitive support thresholds,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Minimizing information loss in shared data: Hiding frequent patterns with multiple sensitive support thresholds
Statistical Analysis and Data Mining ( IF 1.3 ) Pub Date : 2020-04-20 , DOI: 10.1002/sam.11458
Belgin Ergenç Bostanoǧlu ₁ , Ahmet Cumhur Öztürk ₂

Affiliation

Privacy preserving data mining (PPDM) is the process of protecting sensitive knowledge from being discovered by data mining techniques in case of data sharing. Privacy preserving frequent itemset mining (PPFIM) is a subtask and NP‐hard problem of PPDM. Its objective is to modify a given database in such a way that none of the sensitive itemsets of the database owner can be obtained by any frequent itemset mining technique from the modified database. The main challenge of PPFIM is to minimize the distortion given to the data and nonsensitive knowledge while sanitizing all given sensitive itemsets. Distortion‐based sensitive itemset hiding algorithms decrease the support of each sensitive itemset under a predefined sensitive threshold through sanitization. Most of the distortion‐based itemset hiding algorithms allow database owner to define a single sensitive threshold for each sensitive itemset. However, this is a limitation to the database owner since the importance of each sensitive itemset varies. In this paper we propose a distortion‐based itemset hiding algorithm that allows database owner to assign multiple sensitive thresholds, namely itemset oriented pseudo graph based sanitization (IPGBS) algorithm. The purpose of IPGBS algorithm is to give minimum distortion to the nonsensitive knowledge and data while hiding all sensitive itemsets. For this reason, the IPGBS algorithm modifies least amount of transaction and transaction content. The performance evaluation of the IPGBS algorithm is conducted by using two different counterparts on four different databases. The results show that the IPGBS algorithm is more efficient in terms of nonsensitive frequent itemset loss on both dense and sparse databases. It has considerable good results in terms of number of transactions modified, number of items deleted, execution time and total memory allocation as well.

中文翻译：

最小化共享数据中的信息丢失：使用多个敏感支持阈值隐藏频繁模式

隐私保护数据挖掘（PPDM）是在数据共享的情况下防止敏感知识被数据挖掘技术发现的过程。隐私保护频繁项集挖掘（PPFIM）是PPDM的子任务和NP难题。它的目的是以某种方式修改给定的数据库，使得任何频繁的项目集挖掘技术都无法从修改后的数据库中获得数据库所有者的任何敏感项目集。PPFIM的主要挑战是在清理所有给定的敏感项目集的同时，最大程度地减少对数据和非敏感知识的失真。基于失真的敏感项目集隐藏算法通过清理在预定义的敏感阈值下降低了每个敏感项目集的支持。大多数基于失真的项集隐藏算法允许数据库所有者为每个敏感项集定义一个敏感阈值。但是，这是对数据库所有者的限制，因为每个敏感项目集的重要性都不同。在本文中，我们提出了一种基于失真的项集隐藏算法，该算法允许数据库所有者分配多个敏感阈值，即面向项集的基于伪图的清理（IPGBS）算法。IPGBS算法的目的是在隐藏所有敏感项目集的同时，使非敏感知识和数据的失真最小。因此，IPGBS算法修改的交易量和交易内容最少。IPGBS算法的性能评估是通过在四个不同的数据库上使用两个不同的副本进行的。结果表明，在密集数据库和稀疏数据库上，非敏感频繁项集丢失方面，IPGBS算法效率更高。就修改的事务数量，删除的项目数量，执行时间和总内存分配而言，它具有相当不错的效果。

更新日期：2020-04-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>