A popularity-aware reconstruction technique in erasure-coded storage systems,Journal of Parallel and Distributed Computing

当前位置： X-MOL 学术 › J. Parallel Distrib. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A popularity-aware reconstruction technique in erasure-coded storage systems
Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2020-08-21 , DOI: 10.1016/j.jpdc.2020.08.003
Ting Cao , Xiaopu Peng , Chaowei Zhang , Taha Khalid Al Tekreeti , Jianzhou Mao , Xiao Qin , Jianzhong Huang

In this study, we develop a novel data reconstruction technique for parallel storage systems housed in modern data centers. We advocate for erasure-coded data storage systems to archive warm data (a.k.a., unpopular data), which attract a limited number of accesses or updates. Different from hot or cold data, warm data have to be treated in a distinctive way to optimize system performance and storage-space utilization. We pay particular attention to efficient data reconstruction in which faulty data nodes are rebuilt while responding to I/O requests. To achieve this goal, we employ two machine-learning algorithms to offer online data reconstruction in erasure coded storage systems. Our data reconstruction technique is conducive to recovering faulty nodes while boosting read performance for requests accessing data residing on the faulty nodes. Our system is reliant on a clustering mechanism to group files into multiple clusters, in each of which files share similar features. Furthermore, we implement a prediction module where a list of future popular data is projected by keeping track of historical I/O accesses. This popular-data list, in turn, provides predictions on files that are likely to be accessed in the not-too-distant future. The prediction module is responsible for computing similarities among users, thereby setting up priority levels of data blocks to be reconstructed. We implement our data reconstruction scheme in an erasure-coded parallel storage system to recover files with a guidance from the popular-data list. Our experimental results confirm that our system speeds up the data recovery of parallel storage systems while maintaining a high data access performance for on-line users.

中文翻译：

擦除编码存储系统中的普及感知重建技术

在这项研究中，我们为现代数据中心中的并行存储系统开发了一种新颖的数据重建技术。我们提倡使用擦除编码的数据存储系统来归档温暖的数据（又称不受欢迎的数据），这些数据会吸引有限数量的访问或更新。与热数据或冷数据不同，必须以独特的方式处理热数据，以优化系统性能和存储空间利用率。我们特别注意有效的数据重建，其中在响应I / O请求的同时重建有故障的数据节点。为了实现此目标，我们采用了两种机器学习算法来在擦除编码存储系统中提供在线数据重建。我们的数据重构技术有助于恢复故障节点，同时提高请求访问驻留在故障节点上的数据的性能。我们的系统依靠集群机制将文件分组为多个集群，每个文件具有相似的功能。此外，我们实现了一个预测模块，通过跟踪历史I / O访问来预测未来流行数据的列表。反过来，此流行数据列表提供了在不久的将来可能会访问的文件的预测。预测模块负责计算用户之间的相似度，从而设置要重建的数据块的优先级。我们在擦除编码的并行存储系统中实施我们的数据重建方案，以在流行数据列表的指导下恢复文件。

更新日期：2020-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11