当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Batched computation of the singular value decompositions of order two by the AVX-512 vectorization
arXiv - CS - Mathematical Software Pub Date : 2020-05-15 , DOI: arxiv-2005.07403
Vedran Novakovi\'c

In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=U\Sigma V^{\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $\Sigma_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\times 2$ SVD.

中文翻译:

通过 AVX-512 矢量化批量计算二阶奇异值分解

在本文中,提出了一种矢量化算法,用于同时计算多达 8 个奇异值分解 (SVD,每种形式为 $A=U\Sigma V^{\ast}$) 的实数或复数二阶矩阵。该算法扩展到一组任意长度 $n$ 的矩阵,例如,出现在并行 Kogbetliantz 算法的 2n$ 阶方阵的 SVD 的湮灭部分。首先推导出用于单个二阶矩阵的 SVD 算法。它在大多数情况下无误差地缩放输入矩阵 $A$,使其奇异值 $\Sigma_{ii}$ 在其元素有限时不会溢出,然后计算缩放矩阵的 URV 分解,然后是非负上三角中间因子的 SVD。然后为批处理引入矢量友好的数据布局,其中每个输入和输出矩阵的相同索引元素形成向量,并且描述了算法在这些向量上的步骤。然后证明矢量化方法比单独处理每个矩阵快大约三倍,同时比 $2\times 2$ SVD 的直接方法略微提高准确性。
更新日期:2020-05-18
down
wechat
bug