当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Restore Performance for In-line Backup System Combining Deduplication and Delta Compression
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2020-10-01 , DOI: 10.1109/tpds.2020.2991030
Yucheng Zhang , Ye Yuan , Dan Feng , Chunzhi Wang , Xinyun Wu , Lingyu Yan , Deng Pan , Shuanghong Wang

Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.

中文翻译:

结合重复数据删除和增量压缩提高在线备份系统的恢复性能

重复数据删除虽然在删除重复块方面很有效,但会引入块碎片,从而降低恢复性能。提出了重写算法以减少块碎片。Delta 压缩通常用作重复数据删除的补充,以进一步提高存储效率。我们观察到 delta 压缩引入了一种新型的块碎片,这种碎片源于对基本块被碎片化的块的不正确的 delta 压缩。新型块碎片严重降低了恢复性能,并且无法通过现有的重写算法解决。为了解决这个问题,我们提出了 SDC,一种仅对可以直接在恢复缓存中找到基的块执行重复数据删除后增量压缩的方案,以消除对基块的额外磁盘读取,从而避免了新型块碎片。此外,自引用的块可能会被碎片化,这会降低恢复性能,而这些碎片化的块可以作为基础重复降低恢复性能。我们为 SDC 提出了一种混合重写方案来重写这些碎片块。实验结果表明,SDC 将重复数据删除后直接执行增量压缩的方法的恢复性能提高了 2.9-16.9 倍,并实现了 95% 以上的压缩增益。
更新日期:2020-10-01
down
wechat
bug