当前位置: X-MOL 学术SIAM J. Sci. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Soft Errors in the Conjugate Gradient Method: Sensitivity and Robust Numerical Detection
SIAM Journal on Scientific Computing ( IF 3.0 ) Pub Date : 2020-11-18 , DOI: 10.1137/18m122858x
Emmanuel Agullo , Siegfried Cools , Emrullah Fatih Yetkin , Luc Giraud , Nick Schenkels , Wim Vanroose

SIAM Journal on Scientific Computing, Volume 42, Issue 6, Page C335-C358, January 2020.
The conjugate gradient (CG) method is the most widely used iterative scheme for the solution of large sparse systems of linear equations when the matrix is symmetric positive definite. Although more than 60 years old, it is still a serious candidate for extreme-scale computations on large computing platforms. On the technological side, the continuous shrinking of transistor geometry and the increasing complexity of these devices affect dramatically their sensitivity to natural radiation and thus diminish their reliability. One of the most common effects produced by natural radiation is the single event upset which consists in a bit-flip in a memory cell producing unexpected results at the application level. Consequently, future extreme-scale computing facilities will be more prone to errors of any kind, including bit-flips, during their calculations. These numerical and technological observations are the main motivations for this work, where we first investigate through extensive numerical experiments the sensitivity of CG to bit-flips in its main computationally intensive kernels, namely the matrix-vector product and the preconditioner application. We further propose numerical criteria to detect the occurrence of such soft errors and assess their robustness through extensive numerical experiments.


中文翻译:

共轭梯度法中的软误差:灵敏度和鲁棒数值检测

SIAM科学计算杂志,第42卷,第6期,第C335-C358页,2020年1月。
当矩阵是对称正定矩阵时,共轭梯度(CG)方法是求解大型稀疏线性方程组的最广泛使用的迭代方案。尽管已有60多年的历史了,但它仍然是大型计算平台上的超大规模计算的严肃选择。在技​​术方面,晶体管几何尺寸的不断缩小和这些器件的日益复杂化极大地影响了其对自然辐射的敏感性,从而降低了其可靠性。自然辐射产生的最常见影响之一是单事件不安定,其包括存储单元中的位翻转,从而在应用程序级别产生意外结果。因此,未来的超大规模计算设施将更容易出现任何类型的错误,包括位翻转,在他们的计算。这些数值和技术观察是这项工作的主要动机,我们首先通过广泛的数值实验研究CG对主要计算密集型内核(即矩阵矢量乘积和前置条件应用)中位翻转的敏感性。我们进一步提出了数值准则,以检测此类软错误的发生并通过广泛的数值实验评估其鲁棒性。
更新日期:2020-12-04
down
wechat
bug