当前位置: X-MOL 学术J. Electron. Test. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fault Tolerant Lanczos Eigensolver via an Invariant Checking Method
Journal of Electronic Testing ( IF 1.1 ) Pub Date : 2021-04-30 , DOI: 10.1007/s10836-021-05945-1
Felix Loh , Kewal K. Saluja , Parameswaran Ramanathan

An extensive survey of the literature shows that the Lanczos eigensolver is a popular iterative method for approximating a few maximal eigenvalues of a real symmetric matrix, particularly if the matrix is large and sparse. In recent years, graphics processing units (GPUs) have become a popular platform for scientific computing applications, many of which are based on linear algebra, and are increasingly being used as the main computational units in supercomputers. This trend is expected to continue as the number of computations required by scientific applications reach petascale and exascale range. In this paper, building on our earlier work [22], we investigate in detail the error checking mechanism for the Lanczos eigensolver. We identify a low cost invariant for efficient error checking, and through mathematical analysis determine the efficiency of our mechanism when used by the Lanczos eigensolver. We evaluate the proposed fault tolerant scheme using an open-source sparse eigensolver on a GPU platform, with and without the injection of faults. We use a large number of sparse matrices from real applications, to determine the efficiency and efficacy of our method and our implementation shows that the proposed fault tolerant method has good error coverage and low overhead. To the best of our knowledge, we are the first to introduce such a scheme for the Lanczos method.



中文翻译:

通过不变检查方法实现容错Lanczos特征求解器

大量文献研究表明,Lanczos特征求解器是一种流行的迭代方法,用于逼近实对称矩阵的一些最大特征值,尤其是在矩阵较大且稀疏的情况下。近年来,图形处理单元(GPU)已成为科学计算应用程序的流行平台,其中许多都是基于线性代数的,并且越来越多地用作超级计算机中的主要计算单元。随着科学应用所需的计算数量达到PB级和EB级,预计这一趋势将继续。在本文的基础上,基于我们先前的工作[22],我们详细研究了Lanczos特征求解器的错误检查机制。我们确定了一种低成本不变式,以进行有效的错误检查,并通过数学分析确定Lanczos特征求解器使用我们的机制的效率。我们使用带有或不带有故障注入的GPU平台上的开源稀疏特征求解器评估提出的容错方案。我们使用来自实际应用的大量稀疏矩阵来确定我们方法的效率和有效性,并且我们的实现表明所提出的容错方法具有良好的错误覆盖率和较低的开销。据我们所知,我们是第一个为Lanczos方法引入这种方案的公司。确定我们的方法的效率和有效性以及我们的实现表明,所提出的容错方法具有良好的错误覆盖率和较低的开销。据我们所知,我们是第一个为Lanczos方法引入这种方案的公司。确定我们的方法的效率和有效性以及我们的实现表明,所提出的容错方法具有良好的错误覆盖率和较低的开销。据我们所知,我们是第一个为Lanczos方法引入这种方案的公司。

更新日期:2021-05-03
down
wechat
bug