当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A utility based approach for data stream anonymization
Journal of Intelligent Information Systems ( IF 2.3 ) Pub Date : 2019-10-08 , DOI: 10.1007/s10844-019-00577-6
Ugur Sopaoglu , Osman Abul

Data streams are good models to characterize dynamic, on-line, fast and high-volume data requirements of today’s businesses. However, sensitivity of data is usually an obstacle for deployment of many data streams applications. To address this challenging issue, many privacy preserving models, including k-anonymity, have been adapted to data streams. Data stream anonymization frameworks have already addressed how to preserve data quality as much as possible under bounded delays. In this work, our main motivation is to minimize average delay while keeping data quality high. It is our claim that data utility is a function of both data quality and data aging in data streams processing tasks. However, there is a tradeoff between data aging and data quality optimizations. To this end, we present a tunable data stream k-anonymization framework and an algorithm named UBDSA (Utility Based Approach for Data Stream Anonymization). To attain high quality anonymity groups, UBDSA also introduces a new distance metric, named CAIL (Cardinality Aware Information Loss). Our experimental evaluations compare performance of UBDSA with the literature, and the results show its merit in terms of better average delay and information loss.

中文翻译:

一种基于效用的数据流匿名化方法

数据流是描述当今企业动态、在线、快速和大容量数据需求的良好模型。然而,数据的敏感性通常是许多数据流应用程序部署的障碍。为了解决这个具有挑战性的问题,许多隐私保护模型,包括 k-匿名,已经适应了数据流。数据流匿名化框架已经解决了如何在有限延迟下尽可能保持数据质量。在这项工作中,我们的主要动机是在保持高数据质量的同时最小化平均延迟。我们声称数据效用是数据流处理任务中数据质量和数据老化的函数。但是,在数据老化和数据质量优化之间存在权衡。为此,我们提出了一个可调的数据流 k-匿名化框架和一个名为 UBDSA(基于效用的数据流匿名化方法)的算法。为了获得高质量的匿名组,UBDSA 还引入了一种新的距离度量,称为 CAIL(基数感知信息丢失)。我们的实验评估将 UBDSA 的性能与文献进行了比较,结果显示了它在更好的平均延迟和信息丢失方面的优点。
更新日期:2019-10-08
down
wechat
bug