On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers,IEEE Transactions on Reliability

当前位置： X-MOL 学术 › IEEE Trans. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers
IEEE Transactions on Reliability ( IF 5.9 ) Pub Date : 2020-07-27 , DOI: 10.1109/tr.2020.3007127
Suayb S. Arslan , Engin Zeydan

It has become commonplace to observe frequent multiple disk failures in big data centers in which thousands of drives operate simultaneously. Disks are typically protected by replication or erasure coding to guarantee a predetermined reliability. However, in order to optimize data protection, real life disk failure trends need to be modeled appropriately. The classical approach to modeling is to estimate the probability density function of failures using nonparametric estimation techniques such as kernel density estimation (KDE). However, these techniques are suboptimal in the absence of the true underlying density function. Moreover, insufficient data may lead to overfitting. In this article, we propose to use a set of transformations to the collected failure data for almost perfect regression in the transform domain. Then, by inverse transformation, we analytically estimated the failure density through the efficient computation of moment generating functions, and hence, the density functions. Moreover, we developed a visualization platform to extract useful statistical information such as model-based mean time to failure. Our results indicate that for other heavy-tailed data, the complex Gaussian hypergeometric distribution and classical KDE approach can perform best if the overfitting problem can be avoided and the complexity burden is overtaken. On the other hand, we show that the failure distribution exhibits less complex Argus-like distribution after performing the Box–Cox transformation up to appropriate scaling and shifting operations.

中文翻译：

大数据中心重尾盘故障生命周期分布建模

在数以千计的驱动器同时运行的大数据中心中，观察频繁的多个磁盘故障已变得司空见惯。磁盘通常受复制或擦除编码保护，以保证预定的可靠性。但是，为了优化数据保护，需要对现实生活中的磁盘故障趋势进行适当建模。建模的经典方法是使用非参数估计技术（例如核密度估计 (KDE)）来估计故障的概率密度函数。然而，这些技术在缺乏真正的潜在密度函数的情况下是次优的。此外，数据不足可能导致过度拟合。在本文中，我们建议对收集的故障数据使用一组转换，以便在转换域中进行几乎完美的回归。然后，通过逆变换，我们通过有效计算矩生成函数以及密度函数来分析估计失效密度。此外，我们开发了一个可视化平台来提取有用的统计信息，例如基于模型的平均故障时间。我们的结果表明，对于其他重尾数据，如果可以避免过度拟合问题并克服复杂性负担，则复杂的高斯超几何分布和经典 KDE 方法的性能最佳。另一方面，我们表明，在执行 Box-Cox 变换到适当的缩放和移位操作后，故障分布表现出不太复杂的类阿格斯分布。密度函数。此外，我们开发了一个可视化平台来提取有用的统计信息，例如基于模型的平均故障时间。我们的结果表明，对于其他重尾数据，如果可以避免过度拟合问题并克服复杂性负担，则复杂的高斯超几何分布和经典 KDE 方法的性能最佳。另一方面，我们表明，在执行 Box-Cox 变换到适当的缩放和移位操作后，故障分布表现出不太复杂的类阿格斯分布。密度函数。此外，我们开发了一个可视化平台来提取有用的统计信息，例如基于模型的平均故障时间。我们的结果表明，对于其他重尾数据，如果可以避免过度拟合问题并克服复杂性负担，则复杂的高斯超几何分布和经典 KDE 方法的性能最佳。另一方面，我们表明，在执行 Box-Cox 变换到适当的缩放和移位操作后，故障分布表现出不太复杂的类阿格斯分布。如果可以避免过度拟合问题并克服复杂性负担，那么复杂的高斯超几何分布和经典的 KDE 方法可以表现最佳。另一方面，我们表明，在执行 Box-Cox 变换到适当的缩放和移位操作后，故障分布表现出不太复杂的类阿格斯分布。如果可以避免过度拟合问题并克服复杂性负担，那么复杂的高斯超几何分布和经典的 KDE 方法可以表现最佳。另一方面，我们表明，在执行 Box-Cox 变换到适当的缩放和移位操作后，故障分布表现出不太复杂的类阿格斯分布。

更新日期：2020-07-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>