Privacy-aware data cleaning-as-a-service,Information Systems

当前位置： X-MOL 学术 › Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Privacy-aware data cleaning-as-a-service
Information Systems ( IF 3.0 ) Pub Date : 2020-07-31 , DOI: 10.1016/j.is.2020.101608
Yu Huang , Mostafa Milani , Fei Chiang

Data cleaning is a pervasive problem for organizations as they try to reap value from their data. Recent advances in networking and cloud computing technology have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. In this paper, we consider a Data Cleaning-as-a-Service model that allows a client to interact with a data cleaning provider who hosts curated, and sensitive data. We present PACAS: a Privacy-Aware data Cleaning-As-a-Service model that facilitates interaction between the parties with client query requests for data, and a service provider using a data pricing scheme that computes prices according to data sensitivity. We propose new extensions to the model to define generalized data repairs that obfuscate sensitive data to allow data sharing between the client and service provider. We present a new semantic distance measure to quantify the utility of such repairs, and we re-define the notion of consistency in the presence of generalized values. The PACAS model uses (X, Y, L)-anonymity that extends existing data publishing techniques to consider the semantics in the data while protecting sensitive values. Our evaluation over real data show that PACAS safeguards semantically related sensitive values, and provides lower repair errors compared to existing privacy-aware cleaning techniques.

中文翻译：

隐私保护数据清洁即服务

对于组织来说，数据清理是一个普遍的问题，因为他们试图从数据中获取价值。网络和云计算技术的最新进展推动了一种称为数据库即服务的新计算模式，该模型将数据管理任务外包给大型服务提供商。在本文中，我们考虑了一种数据清洗即服务模型，该模型允许客户端与托管选定的敏感数据的数据清洗提供商进行交互。我们本PACAS：一个P rivacy-甲洁具数据Ç leaning-甲S-A-小号ervice模型，该模型可促进双方与客户端对数据的查询请求之间的交互，以及使用数据定价方案（根据数据敏感性计算价格）的服务提供商之间的交互。我们建议对模型进行新的扩展，以定义广义数据修复混淆敏感数据以允许客户端和服务提供商之间的数据共享。我们提出了一种新的语义距离度量来量化此类修复的效用，并且我们在存在通用值的情况下重新定义了一致性的概念。PACAS模型使用（X，Y，L）匿名性，该匿名性扩展了现有数据发布技术，可以在保护敏感值的同时考虑数据中的语义。我们对真实数据的评估表明，与现有的隐私感知清洁技术相比，PACAS可以保护语义相关的敏感值，并提供较低的修复错误。

更新日期：2020-07-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11