当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving the Performance of Deduplication-Based Storage Cache via Content-Driven Cache Management Methods
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-07-29 , DOI: 10.1109/tpds.2020.3012704
Yujuan Tan , Congcong Xu , Jing Xie , Zhichao Yan , Hong Jiang , Witawas Srisa-an , Xianzhang Chen , Duo Liu

Data deduplication, as a proven technology for effective data reduction in backup and archiving storage systems, is also showing promises in increasing the logical space capacity for storage caches by removing redundant data. However, our in-depth evaluation of the existing deduplication-aware caching algorithms reveals that they only work well when the cached block size is set to 4 KB. Unfortunately, modern storage systems often set the block size to be much larger than 4 KB, and in this scenario, the overall performance of these caching schemes drops below that of the conventional replacement algorithms without any deduplication. There are several reasons for this performance degradation. The first reason is the deduplication overhead, which is the time spent on generating the data fingerprints and their use to identify duplicate data. Such overhead offsets the benefits of deduplication. The second reason is the extremely low cache space utilization caused by read and write alignment. The third reason is that existing algorithms only exploit access locality to identify block replacement. There is a lost opportunity to effectively leverage the content usage patterns such as intensity of content redundancy and sharing in deduplication-based storage caches to further improve performance. We propose CDAC, a Content-driven Deduplication-Aware Cache, to address this problem. CDAC focuses on exploiting the content redundancy in blocks and intensity of content sharing among source addresses in cache management strategies. We have implemented CDAC based on LRU and ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental results show that CDAC-LRU and CDAC-ARC outperform the state-of-the-art deduplication-aware caching algorithms, D-LRU, and D-ARC, by up to 23.83X in read cache hit ratio, with an average of 3.23X, and up to 53.3 percent in IOPS, with an average of 49.8 percent, under a real-world mixed workload when the cache size ranges from 20 to 50 percent of the workload size and the block size ranges from 4KB to 32 KB.

中文翻译:


通过内容驱动的缓存管理方法提高基于重复数据删除的存储缓存的性能



重复数据删除作为一种在备份和归档存储系统中有效减少数据的成熟技术,也显示出通过删除冗余数据来增加存储缓存的逻辑空间容量的前景。然而,我们对现有重复数据删除感知缓存算法的深入评估表明,它们只有在缓存块大小设置为 4 KB 时才能正常工作。不幸的是,现代存储系统通常将块大小设置为远大于4 KB,在这种情况下,这些缓存方案的整体性能低于没有任何重复数据删除的传统替换算法的性能。造成这种性能下降的原因有多种。第一个原因是重复数据删除开销,即生成数据指纹及其用于识别重复数据所花费的时间。这样的开销抵消了重复数据删除的好处。第二个原因是读写对齐导致的缓存空间利用率极低。第三个原因是现有算法仅利用访问局部性来识别块替换。失去了有效利用内容使用模式(例如基于重复数据删除的存储缓存中的内容冗余和共享强度)以进一步提高性能的机会。我们提出了 CDAC(内容驱动的重复数据删除感知缓存)来解决这个问题。 CDAC 重点关注在缓存管理策略中利用块中的内容冗余以及源地址之间的内容共享强度。我们基于LRU和ARC算法实现了CDAC,分别称为CDAC-LRU和CDAC-ARC。 我们广泛的实验结果表明,CDAC-LRU 和 CDAC-ARC 的读取缓存命中率优于最先进的重复数据删除感知缓存算法 D-LRU 和 D-ARC,高达 23.83 倍,并且在实际混合工作负载下,当缓存大小范围为工作负载大小的 20% 到 50%、块大小范围为 4KB 到 32 时,平均为 3.23 倍,IOPS 高达 53.3%,平均为 49.8%知识库。
更新日期:2020-07-29
down
wechat
bug