当前位置: X-MOL 学术IEEE Trans. Cloud Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Boafft: Distributed Deduplication for Big Data Storage in the Cloud
IEEE Transactions on Cloud Computing ( IF 6.5 ) Pub Date : 2020-10-01 , DOI: 10.1109/tcc.2015.2511752
Shengmei Luo , Guangyan Zhang , Chengwen Wu , Samee U. Khan , Keqin Li

As data progressively grows within data centers, the cloud storage systems continuously facechallenges in saving storage capacity and providing capabilities necessary to move big data within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system with distributed deduplication. The Boafft achieves scalable throughput and capacity usingmultiple data servers to deduplicate data in parallel, with a minimal loss of deduplication ratio. Firstly, the Boafft uses an efficient data routing algorithm based on data similarity that reduces the network overhead by quickly identifying the storage location. Secondly, the Boafft maintains an in-memory similarity indexing in each data server that helps avoid a large number of random disk reads and writes, which in turn accelerates local data deduplication. Thirdly, the Boafft constructs hot fingerprint cache in each data server based on access frequency, so as to improve the data deduplication ratio. Our comparative analysis with EMC's stateful routing algorithm reveals that the Boafft can provide a comparatively high deduplication ratio with a low network bandwidth overhead. Moreover, the Boafft makes better usage of the storage space, with higher read/write bandwidth and good load balance.

中文翻译:

Boafft:用于云中大数据存储的分布式重复数据删除

随着数据中心内数据的逐渐增长,云存储系统在节省存储容量和提供在可接受的时间范围内移动大数据所需的能力方面不断面临挑战。在本文中,我们介绍了 Boafft,这是一个具有分布式重复数据删除功能的云存储系统。Boafft 实现了可扩展的吞吐量和容量,使用多个数据服务器并行删除重复数据,同时将重复数据删除率损失降至最低。首先,Boafft 使用基于数据相似性的高效数据路由算法,通过快速识别存储位置来降低网络开销。其次,Boafft 在每个数据服务器中维护一个内存中的相似性索引,有助于避免大量随机磁盘读写,从而加速本地重复数据删除。第三,Boafft根据访问频率在每个数据服务器中构建热指纹缓存,以提高重复数据删除率。我们与 EMC 的状态路由算法的比较分析表明,Boafft 可以以较低的网络带宽开销提供相对较高的重复数据删除率。此外,Boafft 可以更好地利用存储空间,具有更高的读写带宽和良好的负载平衡。
更新日期:2020-10-01
down
wechat
bug