Soft errors in DNN accelerators: A comprehensive review,Microelectronics Reliability

当前位置： X-MOL 学术 › Microelectron. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Soft errors in DNN accelerators: A comprehensive review
Microelectronics Reliability ( IF 1.6 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.microrel.2020.113969
Younis Ibrahim , Haibin Wang , Junyang Liu , Jinghe Wei , Li Chen , Paolo Rech , Khalid Adam , Gang Guo

Abstract Deep learning tasks cover a broad range of domains and an even more extensive range of applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural Network (DNN) algorithms are implemented on different systems, from small embedded devices to data centers. DNN accelerators have proven to be a key to efficiency, as they are even more efficient than CPUs. Therefore, they have become the major executing hardware for DNN algorithms. However, these accelerators are susceptible to several types of faults. Soft errors pose a particular threat because the high-level parallelism in these accelerators can propagate a single failure to multiple errors in the next levels until the model predictions' output is affected. This article presents a comprehensive review of the reliability of the DNN accelerators. The study begins by reviewing the widely assumed claim that DNNs are inherently tolerant to faults. Then, the available DNN accelerators are systematically classified into several categories. Each is individually analyzed; and the commonly used accelerators are compared in an attempt to answer the question, which accelerator is more reliable against transient faults? The concluding part of this review highlights the gray areas of the DNNs and predicts future research directions that will enhance its applicability. This study is expected to benefit researchers in the areas of deep learning, DNN accelerators, and reliability of this efficient paradigm.

中文翻译：

DNN 加速器中的软错误：全面审查

摘要深度学习任务涵盖了广泛的领域和更广泛的应用范围，从娱乐到极其安全的领域。因此，深度神经网络 (DNN) 算法在不同的系统上实现，从小型嵌入式设备到数据中心。DNN 加速器已被证明是提高效率的关键，因为它们甚至比 CPU 效率更高。因此，它们已成为 DNN 算法的主要执行硬件。然而，这些加速器容易受到多种类型的故障的影响。软错误造成了特别的威胁，因为这些加速器中的高级并行性可以将单个故障传播到下一级中的多个错误，直到模型预测的输出受到影响。本文全面回顾了 DNN 加速器的可靠性。该研究首先回顾了广泛假设的说法，即 DNN 天生具有容错能力。然后，将可用的 DNN 加速器系统地分为几类。每个都单独分析；和常用的加速器进行比较，试图回答这个问题，哪个加速器对瞬态故障更可靠？本综述的结论部分突出了 DNN 的灰色区域，并预测了将增强其适用性的未来研究方向。这项研究有望使深度学习、DNN 加速器和这种高效范式的可靠性领域的研究人员受益。每个都单独分析；和常用的加速器进行比较，试图回答这个问题，哪个加速器对瞬态故障更可靠？本综述的结论部分突出了 DNN 的灰色区域，并预测了将增强其适用性的未来研究方向。这项研究有望使深度学习、DNN 加速器和这种高效范式的可靠性领域的研究人员受益。每个都单独分析；和常用的加速器进行比较，试图回答这个问题，哪个加速器对瞬态故障更可靠？本综述的结论部分突出了 DNN 的灰色区域，并预测了将增强其适用性的未来研究方向。这项研究有望使深度学习、DNN 加速器和这种高效范式的可靠性领域的研究人员受益。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11