当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
In-Memory Data Anonymization Using Scalable and High Performance RDD Design
Electronics ( IF 2.6 ) Pub Date : 2020-10-20 , DOI: 10.3390/electronics9101732
Sibghat Ullah Bazai , Julian Jang-Jaccard

Recent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for iterative tasks. We propose “SparkDA” which is a new novel anonymization technique that is designed to take the full advantage of Spark platform to generate privacy-preserving anonymized dataset in the most efficient way possible. Our proposal offers a better partition control, in-memory operation and cache management for iterative operations that are heavily utilised for data anonymization processing. Our proposal is based on Spark’s Resilient Distributed Dataset (RDD) with two critical operations of RDD, such as FlatMapRDD and ReduceByKeyRDD, respectively. The experimental results demonstrate that our proposal outperforms the existing approaches in terms of performance and scalability while maintaining high data privacy and utility levels. This illustrates that our proposal is capable to be used in a wider big data applications that demands privacy.

中文翻译:

使用可扩展和高性能RDD设计的内存中数据匿名化

数据匿名化技术的最新研究主要集中在MapReduce上。但是,这些现有的基于MapReduce的方法由于不恰当地使用数据分配,昂贵的磁盘I / O访问和网络传输以及不支持迭代任务而常常遭受许多性能开销。我们提出“ SparkDA”,这是一种新颖的匿名技术,旨在充分利用Spark平台,以尽可能高效的方式生成隐私保护的匿名数据集。我们的建议为迭代操作提供了更好的分区控制,内存中操作和缓存管理,这些操作大量用于数据匿名化处理。我们的提案基于Spark的弹性分布式数据集(RDD),其中包含RDD的两个关键操作,例如FlatMapRDD和ReduceByKeyRDD,分别。实验结果表明,我们的建议在性能和可伸缩性方面优于现有方法,同时保持了较高的数据隐私性和实用性。这说明我们的建议能够用于要求私密性的更广泛的大数据应用程序中。
更新日期:2020-10-20
down
wechat
bug