当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Availability of Multicore Real-Time Systems Suffering Both Permanent and Transient Faults
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2019-12-01 , DOI: 10.1109/tc.2019.2935042
Junlong Zhou , Xiaobo Sharon Hu , Yue Ma , Jin Sun , Tongquan Wei , Shiyan Hu

CMOS scaling has greatly increased concerns for both lifetime reliability due to permanent faults and soft-error reliability due to transient faults. Most existing works only focus on one of the two reliability concerns, but often times techniques used to increase one type of reliability may adversely impact the other type. A few efforts do consider both types of reliability together and use two different metrics to quantify the two types of reliability. However, for many systems, the user's concern is to maximize system availability by improving the mean time to failure (MTTF), regardless of whether the failure is caused by permanent or transient faults. Addressing this concern requires a uniform metric to measure the effect due to both types of faults. This paper introduces a novel analytical expression for calculating the MTTF due to transient faults. Using this new formula and an existing method to evaluate system MTTF, we tackle the problem of maximizing availability for multicore real-time systems with consideration of permanent and transient faults. A framework is proposed to solve the system availability maximization problem. Experimental results on a hardware board and simulation results of synthetic tasks show that our scheme significantly improves system MTTF (and hence availability) compared with existing techniques.

中文翻译:

提高遭受永久和瞬时故障的多核实时系统的可用性

CMOS 缩放大大增加了对永久故障导致的寿命可靠性和瞬态故障导致的软错误可靠性的关注。大多数现有工作只关注两个可靠性问题中的一个,但通常用于提高一种可靠性的技术可能会对另一种产生不利影响。一些努力确实将两种类型的可靠性一起考虑,并使用两种不同的指标来量化这两种类型的可靠性。但是,对于许多系统,用户关心的是通过提高平均故障时间 (MTTF) 来最大化系统可用性,而不管故障是由永久性故障还是暂时性故障引起的。解决这个问题需要一个统一的度量标准来衡量两种类型的故障造成的影响。本文介绍了一种用于计算瞬态故障导致的 MTTF 的新解析表达式。使用这个新公式和现有方法来评估系统 MTTF,我们解决了在考虑永久和瞬态故障的情况下最大化多核实时系统可用性的问题。提出了一种解决系统可用性最大化问题的框架。硬件板上的实验结果和综合任务的仿真结果表明,与现有技术相比,我们的方案显着提高了系统 MTTF(以及可用性)。提出了一种解决系统可用性最大化问题的框架。在硬件板上的实验结果和综合任务的仿真结果表明,与现有技术相比,我们的方案显着提高了系统 MTTF(以及可用性)。提出了一种解决系统可用性最大化问题的框架。在硬件板上的实验结果和综合任务的仿真结果表明,与现有技术相比,我们的方案显着提高了系统 MTTF(以及可用性)。
更新日期:2019-12-01
down
wechat
bug