当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
G-SEAP: Analyzing and characterizing soft-error aware approximation in GPGPUs
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-03-27 , DOI: 10.1016/j.future.2020.03.040
Xiaohui Wei , Hengshan Yue , Shang Gao , Lina Li , Ruyu Zhang , Jingweijia Tan

As General-Purpose Graphics Processing Units (GPGPUs) become pervasive for the High-Performance Computing (HPC), ensuring that programs can be protected from soft errors has become increasingly important. Soft errors may cause Silent Data Corruptions (SDCs), which produces erroneous execution results silently. Due to the massive parallelism of GPGPUs, fully protecting them against soft errors introduces nontrivial overhead. Fortunately, imprecise execution outcomes are inherently tolerable for some HPC programs due to the nature of these applications. Leveraging the feature, selective soft error protection can be applied to reduce energy consumptions.

In this work, we first propose a GPGPU-based Soft-Error aware APproximation analysis framework (G-SEAP) to characterize the approximation characteristics of soft errors. Based on G-SEAP, we perform an exhaustive analysis for 17 representative HPC benchmarks and observe 72.7% of SDCs on average are approximable. We also observe that the dataflow of application, kernel function reliability requirement, instruction-type, and data bit-location are all important factors for program’s correctness. Lastly, according to the observations, we further design an approximate Error Correction Codes (ECCs) mechanism and an approximate instruction duplication technique to illustrate how G-SEAP provides useful guidance for energy-efficient soft-error elimination in GPGPUs.



中文翻译:

G-SEAP:分析和表征GPGPU中的软错误感知逼近

随着通用图形处理单元(GPGPU)对高性能计算(HPC)的普及,确保程序免受软错误的侵害变得越来越重要。软错误可能会导致静默数据损坏(SDC),从而静默地产生错误的执行结果。由于GPGPU的大规模并行性,充分保护它们免受软错误的影响会带来不小的开销。幸运的是,由于这些应用程序的性质,某些HPC程序固有地可以容忍执行结果不准确。利用此功能,可以应用选择性软错误保护来减少能耗。

在这项工作中,我们首先提出了一个基于PGPU-小号oft- è RROR知道AP proximation分析框架(G-SEAP)来表征软错误的近似特征。基于G-SEAP,我们对17个代表性HPC基准进行了详尽的分析,并观察到72.7%的SDC平均而言是近似的。我们还观察到应用程序的数据流,内核功能的可靠性要求,指令类型和数据位的位置都是影响程序正确性的重要因素。最后,根据观察结果,我们进一步设计了一种近似纠错码(ECC)机制和一种近似指令复制技术,以说明G-SEAP如何为GPGPU中的节能软错误消除提供有用的指导。

更新日期:2020-03-27
down
wechat
bug