当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-01-18 , DOI: 10.1109/tvlsi.2020.3048829
Shamik Kundu , Suvadeep Banerjee , Arnab Raha , Suriyaprakash Natarajan , Kanad Basu

High accuracy and ever-increasing computing power have made deep neural networks (DNNs) the algorithm of choice for various machine learning, computer vision, and image processing applications across the computing spectrum. To this end, Google developed the tensor processing unit (TPU) to accelerate the computationally intensive matrix multiplication operation of a DNN on its systolic array architecture. Faults manifested in the datapath of such a systolic array due to latent manufacturing defects or single-event effects may lead to functional safety (FuSa) violation. Although DNNs are known to resist minor perturbations with their inherent fault-tolerant characteristics, we show that the classification accuracy of the model plummets from 97.4% to 7.75% with a minimal fault rate of 0.0003% in the accelerator, implying catastrophic circumstances when deployed across mission-critical systems. Hence, to ensure FuSa of such accelerators, this article provides an extensive FuSa assessment of the accelerator exposed to faults in the datapath, by varying the network parameters, position, and characteristics of the induced error across multiple exhaustive data sets. Furthermore, we propose two novel strategies to obtain a diminutive set of functional test patterns to detect FuSa violation in a DNN accelerator. Our experimental results demonstrate that the obtained test sets can achieve an average of 92.63% (in some cases, up to 100%) fault coverage with cardinality as low as 0.1% of the entire test data set.

中文翻译:

基于脉动阵列的深度学习硬件加速器的功能安全性

高精度和不断增长的计算能力已使深度神经网络(DNN)成为计算范围内各种机器学习,计算机视觉和图像处理应用程序的首选算法。为此,Google开发了张量处理单元(TPU)来加速DNN在其脉动阵列架构上的计算密集型矩阵乘法运算。由于潜在的制造缺陷或单事件影响而在这种脉动阵列的数据路径中显示的故障可能会导致功能安全(FuSa)违规。尽管已知DNN具有固有的容错特性,可以抵抗较小的扰动,但我们表明,模型的分类精度从97.4%下降到7.75%,加速器中的最小故障率为0.0003%,跨关键任务系统部署时意味着灾难性情况。因此,为了确保此类加速器具有FuSa功能,本文通过更改跨多个穷举数据集的网络参数,位置和诱发错误的特性,对暴露于数据路径故障中的加速器进行了广泛的FuSa评估。此外,我们提出了两种新颖的策略来获得一组功能测试模式,以检测DNN加速器中的FuSa违规行为。我们的实验结果表明,所获得的测试集可以实现平均92.63%(在某些情况下,高达100%)的故障覆盖率,而基数仅为整个测试数据集的0.1%。本文通过改变跨多个穷举数据集的网络参数,位置和诱发错误的特性,对暴露于数据路径故障中的加速器进行了广泛的FuSa评估。此外,我们提出了两种新颖的策略来获得一组功能测试模式,以检测DNN加速器中的FuSa违规行为。我们的实验结果表明,所获得的测试集可以实现平均92.63%(在某些情况下,高达100%)的故障覆盖率,而基数仅为整个测试数据集的0.1%。本文通过改变跨多个穷举数据集的网络参数,位置和诱发错误的特性,对暴露于数据路径故障中的加速器进行了广泛的FuSa评估。此外,我们提出了两种新颖的策略来获得一组功能测试模式,以检测DNN加速器中的FuSa违规行为。我们的实验结果表明,所获得的测试集可以实现平均92.63%(在某些情况下,高达100%)的故障覆盖率,而基数仅为整个测试数据集的0.1%。我们提出了两种新颖的策略来获得一组功能测试模式,以检测DNN加速器中的FuSa违规行为。我们的实验结果表明,所获得的测试集可以实现平均92.63%(在某些情况下,高达100%)的故障覆盖率,而基数仅为整个测试数据集的0.1%。我们提出了两种新颖的策略来获得一组功能测试模式,以检测DNN加速器中的FuSa违规行为。我们的实验结果表明,所获得的测试集可以实现平均92.63%(在某些情况下,高达100%)的故障覆盖率,而基数仅为整个测试数据集的0.1%。
更新日期:2021-02-26
down
wechat
bug