当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Restore Performance of Packed Datasets in Deduplication Systems via Reducing Persistent Fragmented Chunks
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2020-07-01 , DOI: 10.1109/tpds.2020.2972898
Yucheng Zhang , Min Fu , Xinyun Wu , Fang Wang , Qiang Wang , Chunzhi Wang , Xinhua Dong , Hongmu Han

Data deduplication, though being efficient for redundancy elimination in storage systems, introduces chunk fragmentation which severely decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Typically, the backup software aggregates files into larger “tar” type files for storage. We observe that, in tar type datasets, a large number of Persistent Fragmented Chunks (PFCs) are repeatedly rewritten by state-of-the-art rewriting algorithms in every backup, which severely impacts restore performance. We found that the existence of PFCs is due to the traditional strategy of storing PFCs along with other chunks in the containers to preserve the stream locality, rendering them always stored in the containers with low utilization. We propose DePFC to reduce PFCs. DePFC identifies and removes PFCs from the containers preserving the stream locality, and groups them together, to increase the utilization of containers holding them for the subsequent backup, thus preventing them from being rewritten again. We further propose an FC Buffer to avoid mistaken rewrites of PFCs and grouping PFCs that cause restore cache thrashing together. Experimental results demonstrate that DePFC improves restore performance of state-of-the-art rewriting algorithms by 44.24-89.42 percent, while attaining comparable deduplication efficiency, and FC Buffer further improves restore performance.

中文翻译:

通过减少持久性碎片块提高重复数据删除系统中打包数据集的恢复性能

重复数据删除虽然对于存储系统中的冗余消除很有效,但会引入块碎片,这会严重降低恢复性能。提出了重写算法以减少块碎片。通常,备份软件将文件聚合成更大的“tar”类型文件进行存储。我们观察到,在 tar 类型的数据集中,大量的持久性碎片块 (PFC) 在每次备份中都被最先进的重写算法反复重写,这严重影响了恢复性能。我们发现 PFC 的存在是由于将 PFC 与其他块一起存储在容器中以保留流局部性的传统策略,使它们始终存储在利用率较低的容器中。我们建议使用 DePFC 来减少 PFC。DePFC 从保留流局部性的容器中识别并删除 PFC,并将它们组合在一起,以提高保存它们的容器的利用率以进行后续备份,从而防止它们再次被重写。我们进一步提出了一个 FC 缓冲区,以避免错误地重写 PFC 并将导致恢复缓存抖动的 PFC 分组在一起。实验结果表明,DePFC 将最先进的重写算法的恢复性能提高了 44.24-89.42%,同时获得了可比的重复数据删除效率,而 FC Buffer 进一步提高了恢复性能。我们进一步提出了一个 FC 缓冲区,以避免错误地重写 PFC 并将导致恢复缓存抖动的 PFC 分组在一起。实验结果表明,DePFC 将最先进的重写算法的恢复性能提高了 44.24-89.42%,同时获得了可比的重复数据删除效率,而 FC Buffer 进一步提高了恢复性能。我们进一步提出了一个 FC 缓冲区,以避免错误地重写 PFC 并将导致恢复缓存抖动的 PFC 分组在一起。实验结果表明,DePFC 将最先进的重写算法的恢复性能提高了 44.24-89.42%,同时获得了可比的重复数据删除效率,而 FC Buffer 进一步提高了恢复性能。
更新日期:2020-07-01
down
wechat
bug