当前位置: X-MOL 学术IEEE Trans. Nucl. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators
IEEE Transactions on Nuclear Science ( IF 1.8 ) Pub Date : 2021-06-04 , DOI: 10.1109/tns.2021.3086686
Sebastian Blower , Paolo Rech , Carlo Cazzaniga , Maria Kastriotou , Christopher D. Frost

EdgeAI is an emerging artificial intelligence (AI) accelerator technology, which is capable of delivering improved AI performance at both a lower cost and a lower power level. With the aim of implementation in large quantities and in safety-critical environments, it is imperative to understand how single-event effects (SEEs) affect the reliability of this new family of devices and to propose efficient hardening solutions. Through neutron beam experiments and fault-injection analysis of a commercial-off-the-shelf (COTS) EdgeAI device, we are able to identify the device’s SEE failure-modes, separate the error rate contributions of the device’s different resources, and characterize the device’s SEE reliability. During this analysis, we discovered that the vast majority of single-bit flips have no appreciable effect on the output. After this analysis, we propose a hardening solution that implements triple-modular redundancy (TMR) in the device without changing its physical architecture. We experimentally validate this solution and show that we are able to correct 96% of the misclassifications (critical errors) with nearly zero overhead.

中文翻译:

评估和减轻中子对 COTS EdgeAI 加速器的影响

EdgeAI 是一种新兴的人工智能 (AI) 加速器技术,能够以更低的成本和更低的功耗水平提供更高的 AI 性能。为了在安全关键环境中大量实施,必须了解单事件效应 (SEE) 如何影响这一新系列设备的可靠性并提出有效的强化解决方案。通过对商用现货 (COTS) EdgeAI 设备的中子束实验和故障注入分析​​,我们能够识别设备的 SEE 故障模式,分离设备不同资源的错误率贡献,并表征设备的 SEE 可靠性。在此分析过程中,我们发现绝大多数单比特翻转对输出没有明显影响。经过这次分析,我们提出了一种加固解决方案,可以在不改变其物理架构的情况下在设备中实现三重模块冗余 (TMR)。我们通过实验验证了该解决方案,并表明我们能够以几乎为零的开销纠正 96% 的错误分类(关键错误)。
更新日期:2021-06-04
down
wechat
bug