当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Research on Multifeature Data Routing Strategy in Deduplication
Scientific Programming Pub Date : 2020-10-14 , DOI: 10.1155/2020/8869237
Qinlu He 1 , Genqing Bian 1 , Bilin Shao 2 , Weiqi Zhang 1
Affiliation  

Deduplication is a popular data reduction technology in storage systems which has significant advantages, such as finding and eliminating duplicate data, reducing data storage capacity required, increasing resource utilization, and saving storage costs. The file features are a key factor that is used to calculate the similarity between files, but the similarity calculated by the single feature has some limitations especially for the similar files. The storage node feature reflects the load condition of the node, which is the key factor to be considered in the data routing. This paper introduces a multifeature data routing strategy (DRMF). The routing strategy is made based on the features of the cluster, including routing communication, file similarity calculation, and the determination of the target node. The mutual information exchange is achieved by routing communication, routing servers, and storage nodes. The storage node calculates the similarity between the files stored, and then the file is routed according to the information provided by the routing server. The routing server determines the target node of the route according to the similar results and the node load features. The system prototype is designed and implemented; also, we develop a system to process the feature of cluster and determine the specific parameters of various features of experiments. In the end, we simulate the multifeature data routing and single-feature data routing, respectively, and compare the deduplication rate and data slope between the two strategies. The experimental results show that the proposed data routing strategy using multiple features can improve the deduplication rate of the cluster and maintain a lower data skew rate compared with the single-feature-based routing strategy MCS; DRMF can improve the deduplication rate of the cluster and maintain a lower data skew rate.

中文翻译:

去重中多特征数据路由策略研究

重复数据删除是存储系统中流行的数据缩减技术,具有查找和消除重复数据、减少所需数据存储容量、提高资源利用率、节省存储成本等显着优势。文件特征是计算文件间相似度的关键因素,但单一特征计算的相似度有一定的局限性,尤其是对于相似的文件。存储节点特征反映了节点的负载情况,是数据路由需要考虑的关键因素。本文介绍了一种多特征数据路由策略(DRMF)。路由策略是根据集群的特点制定的,包括路由通信、文件相似度计算、目标节点的确定等。相互信息交换是通过路由通信、路由服务器和存储节点来实现的。存储节点计算存储的文件之间的相似度,然后根据路由服务器提供的信息对文件进行路由。路由服务器根据相似结果和节点负载特征确定路由的目标节点。系统原型设计与实现;此外,我们开发了一个系统来处理集群的特征并确定实验的各种特征的具体参数。最后,我们分别模拟了多特征数据路由和单特征数据路由,并比较了两种策略的去重率和数据斜率。实验结果表明,与基于单特征的路由策略MCS相比,所提出的使用多特征的数据路由策略可以提高集群的去重率并保持较低的数据倾斜率;DRMF 可以提高集群的重复数据删除率,并保持较低的数据倾斜率。
更新日期:2020-10-14
down
wechat
bug