当前位置: X-MOL 学术IEEE Trans. Dependable Secure Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Impact of Failure Prediction on Availability: Modeling and Comparative Analysis of Predictive and Reactive Methods
IEEE Transactions on Dependable and Secure Computing ( IF 7.3 ) Pub Date : 2018-01-01 , DOI: 10.1109/tdsc.2018.2806448
Igor Kaitovic , Miroslaw Malek

Predicting failures and acting proactively have a potential to improve availability as a correct prediction and a successful mitigation may bring a reward resulting in decrease of downtime and availability improvement. But, conversely, each incorrect prediction may introduce additional downtime (penalty). Therefore, depending on the quality of prediction and the system parameters, predictive fault-tolerance methods may improve or may degrade availability in comparison to the reactive ones. We first derive taxonomies of fault-tolerant techniques and policies to differentiate between reactive and proactive policies that are further classified as systematic and predictive. To evaluate whether a predictive policy improves availability or not, we derive an analytical model for availability quantification. We use Markov chains to extend steady-state availability equation to include: precision and recall, penalty and reward, mitigation success probability and potential failure rate increase due to the prediction load. We also derive an A-measure to optimize failure prediction for increasing availability. In our conclusion, precision and recall have comparable impact on availability as changing MTTF and MTTR. To validate the model we also simulate and analyze availability of a virtualized server with exponential distribution of failure and repair rates.

中文翻译:

故障预测对可用性的影响:预测方法和反应方法的建模和比较分析

预测故障并主动采取行动有可能提高可用性,因为正确的预测和成功的缓解可能会带来回报,从而减少停机时间和提高可用性。但是,相反,每个不正确的预测都可能引入额外的停机时间(惩罚)。因此,根据预测的质量和系统参数,与反应性方法相比,预测性容错方法可能会提高或可能降低可用性。我们首先推导出容错技术和策略的分类法,以区分反应性和主动性策略,后者进一步分为系统性和预测性。为了评估预测策略是否提高了可用性,我们推导出了可用性量化的分析模型。我们使用马尔可夫链将稳态可用性方程扩展为包括:精度和召回率、惩罚和奖励、缓解成功概率和由于预测负载而导致的潜在故障率增加。我们还推导出 A 度量来优化故障预测以提高可用性。在我们的结论中,精度和召回率对可用性的影响与改变 MTTF 和 MTTR 的影响相当。为了验证模型,我们还模拟和分析了故障和修复率呈指数分布的虚拟化服务器的可用性。精度和召回率对可用性的影响与改变 MTTF 和 MTTR 的影响相当。为了验证模型,我们还模拟和分析了故障和修复率呈指数分布的虚拟化服务器的可用性。精度和召回率对可用性的影响与改变 MTTF 和 MTTR 的影响相当。为了验证模型,我们还模拟和分析了故障和修复率呈指数分布的虚拟化服务器的可用性。
更新日期:2018-01-01
down
wechat
bug