当前位置: X-MOL 学术IEEE Comput. Intell. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Multivariate Time Series Streaming Classifier for Predicting Hard Drive Failures [Application Notes]
IEEE Computational Intelligence Magazine ( IF 9 ) Pub Date : 2022-01-12 , DOI: 10.1109/mci.2021.3129962
Josu Ircio , Aizea Lojo , Jose A. Lozano , Usue Mori , Jose A. Lozano

Digital data storage systems such as hard drives can suffer breakdowns that cause the loss of stored data. Due to the cost of data and the damage that its loss entails, hard drive failure prediction is vital. In this context, the objective of this paper is to develop a method for detecting the beginning of hard drive malfunction using streaming SMART data, allowing the user to take actions before the breakdown occurs. This is a challenging task for two main reasons. First, there are not usually many examples of failed hard drives. Second, in these few available examples, hard drives are only identified and labeled as failed after complete breakdown occurs, but the exact moment when they begin to malfunction is usually unknown. Both these aspects significantly complicate the supervised learning of hard drive failure prediction models. To cope with these issues, the problem is addressed as a multidimensional time series streaming classification problem based on sliding windows. Moreover, as a solution to the highly imbalanced situation, the learned classifier is optimized to maximize the minimum recall of classes. Experimental results using the Backblaze benchmark dataset show that the proposed method reliably anticipates hard drive failures and obtains a higher balance between the recall values of both classes, failed and correct disks, compared to other state-of-the-art solutions.

中文翻译:

用于预测硬盘故障的多变量时间序列流分类器 [应用说明]

硬盘驱动器等数字数据存储系统可能会出现故障,导致存储数据丢失。由于数据的成本及其丢失带来的损害,硬盘故障预测至关重要。在这种情况下,本文的目的是开发一种使用流式 SMART 数据检测硬盘故障开始的方法,允许用户在故障发生之前采取行动。这是一项具有挑战性的任务,主要有两个原因。首先,硬盘故障的例子通常并不多。其次,在这几个可用的示例中,硬盘驱动器仅在发生完全故障后才被识别并标记为故障,但它们开始出现故障的确切时间通常是未知的。这两个方面都使硬盘故障预测模型的监督学习变得非常复杂。为了解决这些问题,该问题被解决为基于滑动窗口的多维时间序列流分类问题。此外,作为高度不平衡情况的解决方案,学习的分类器被优化以最大化类的最小召回。使用 Backblaze 基准数据集的实验结果表明,与其他最先进的解决方案相比,所提出的方法能够可靠地预测硬盘驱动器故障,并在故障和正确磁盘这两个类别的召回值之间获得更高的平衡。
更新日期:2022-01-14
down
wechat
bug