当前位置: X-MOL 学术ACM J. Emerg. Technol. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hardware and Software Co-optimization for the Initialization Failure of the ReRAM-based Cross-bar Array
ACM Journal on Emerging Technologies in Computing Systems ( IF 2.1 ) Pub Date : 2020-07-07 , DOI: 10.1145/3393669
Youngseok Kim 1 , Seyoung Kim 2 , Chun-Chen Yeh 3 , Vijay Narayanan 4 , Jungwook Choi 5
Affiliation  

Recent advances in deep neural network demand more than millions of parameters to handle and mandate the high-performance computing resources with improved efficiency. The cross-bar array architecture has been considered as one of the promising deep learning architectures that shows a significant computing gain over the conventional processors. To investigate the feasibility of the architecture, we examine non-idealities and their impact on the performance. Specifically, we study the impact of failed cells due to the initialization process of the resistive memory-based cross-bar array. Unlike the conventional memory array, individual memory elements cannot be rerouted and, thus, may have a critical impact on model accuracy. We categorize the possible failures and propose hardware implementation that minimizes catastrophic failures. Such hardware optimization bounds the possible logical value of the failed cells and allows us to compensate for the loss of accuracy via off-line training. By introducing the random weight defects during the training, we show that the model becomes more resilient on the device initialization failures, therefore, less prone to degrade the inference performance due to the failed devices. Our study sheds light on the hardware and software co-optimization procedure to cope with potentially catastrophic failures in the cross-bar array.

中文翻译:

基于 ReRAM 的交叉开关阵列初始化失败的软硬件协同优化

深度神经网络的最新进展需要超过数百万个参数来以更高的效率处理和授权高性能计算资源。交叉条阵列架构被认为是有前途的深度学习架构之一,与传统处理器相比,它显示出显着的计算增益。为了研究架构的可行性,我们检查了非理想性及其对性能的影响。具体来说,我们研究了由于基于电阻存储器的交叉阵列的初始化过程而导致的故障单元的影响。与传统存储器阵列不同,单个存储器元件不能重新布线,因此可能对模型精度产生关键影响。我们对可能的故障进行分类,并提出将灾难性故障降至最低的硬件实施方案。这种硬件优化限制了故障单元的可能逻辑值,并允许我们通过离线训练来补偿准确性的损失。通过在训练期间引入随机权重缺陷,我们表明该模型在设备初始化失败时变得更有弹性,因此不太容易因失败的设备而降低推理性能。我们的研究揭示了硬件和软件协同优化程序,以应对横杆阵列中潜在的灾难性故障。由于设备故障,不太容易降低推理性能。我们的研究揭示了硬件和软件协同优化程序,以应对横杆阵列中潜在的灾难性故障。由于设备故障,不太容易降低推理性能。我们的研究揭示了硬件和软件协同优化程序,以应对横杆阵列中潜在的灾难性故障。
更新日期:2020-07-07
down
wechat
bug