当前位置: X-MOL 学术IEEE Trans. Cloud Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Application-Aware Big Data Deduplication in Cloud Environment
IEEE Transactions on Cloud Computing ( IF 6.5 ) Pub Date : 2019-10-01 , DOI: 10.1109/tcc.2017.2710043
Yinjin Fu , Nong Xiao , Hong Jiang , Guyu Hu , Weiwei Chen

Deduplication has become a widely deployed technology in cloud data centers to improve IT resources efficiency. However, traditional techniques face a great challenge in big data deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio. We propose AppDedupe, an application-aware scalable inline distributed deduplication framework in cloud environment, to meet this challenge by exploiting application awareness, data similarity and locality to optimize distributed deduplication with inter-node two-tiered data routing and intra-node application-aware deduplication. It first dispenses application data at file level with an application-aware routing to keep application locality, then assigns similar application data to the same storage node at the super-chunk granularity using a handprinting-based stateful data routing scheme to maintain high global deduplication efficiency, meanwhile balances the workload across nodes. AppDedupe builds application-aware similarity indices with super-chunk handprints to speedup the intra-node deduplication process with high efficiency. Our experimental evaluation of AppDedupe against state-of-the-art, driven by real-world datasets, demonstrates that AppDedupe achieves the highest global deduplication efficiency with a higher global deduplication effectiveness than the high-overhead and poorly scalable traditional scheme, but at an overhead only slightly higher than that of the scalable but low duplicate-elimination-ratio approaches.

中文翻译:

云环境中的应用感知大数据重复数据删除

重复数据删除已成为云数据中心广泛部署的一项提高IT资源效率的技术。然而,传统技术在大数据重复数据删除中面临着巨大挑战,需要在可扩展重复数据删除吞吐量和高重复消除率这两个相互冲突的目标之间做出合理的权衡。我们提出了 AppDedupe,一种云环境中应用程序感知的可扩展内联分布式重复数据删除框架,通过利用应用程序感知、数据相似性和局部性来优化具有节点间两层数据路由和节点内应用程序感知的分布式重复数据删除来应对这一挑战。重复数据删除。它首先通过应用感知路由在文件级别分配应用数据,以保持应用的局部性,然后使用基于手印的状态数据路由方案将相似的应用数据以超块粒度分配给同一个存储节点,以保持较高的全局重复数据删除效率,同时平衡节点之间的工作负载。AppDedupe 使用超级块手印构建应用感知相似度索引,以高效加速节点内重复数据删除过程。我们在真实世界数据集的驱动下,针对最新技术对 AppDedupe 进行的实验评估表明,与高开销和可扩展性差的传统方案相比,AppDedupe 实现了最高的全局重复数据删除效率和更高的全局重复数据删除效率,但在开销仅略高于可扩展但重复消除率低的方法。
更新日期:2019-10-01
down
wechat
bug