A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data,arXiv - CS - Mathematical Software

当前位置： X-MOL 学术 › arXiv.cs.MS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data
arXiv - CS - Mathematical Software Pub Date : 2020-09-02 , DOI: arxiv-2009.00761
Drew Schmidt

The Singular Value Decomposition (SVD) is one of the most important matrix factorizations, enjoying a wide variety of applications across numerous application domains. In statistics and data analysis, the common applications of SVD such as Principal Components Analysis (PCA) and linear regression. Usually these applications arise on data that has far more rows than columns, so-called "tall/skinny" matrices. In the big data analytics context, this may take the form of hundreds of millions to billions of rows with only a few hundred columns. There is a need, therefore, for fast, accurate, and scalable tall/skinny SVD implementations which can fully utilize modern computing resources. To that end, we present a survey of three different algorithms for computing the SVD for these kinds of tall/skinny data layouts using MPI for communication. We contextualize these with common big data analytics techniques, principally PCA. Finally, we present both CPU and GPU timing results from the Summit supercomputer, and discuss possible alternative approaches.

中文翻译：

分布式高/瘦数据奇异值分解方法综述

奇异值分解 (SVD) 是最重要的矩阵分解之一，在众多应用领域享有广泛的应用。在统计和数据分析中，SVD的常见应用如主成分分析（PCA）和线性回归。通常这些应用程序出现在行比列多得多的数据上，即所谓的“高/瘦”矩阵。在大数据分析环境中，这可能采用仅几百列的数亿到数十亿行的形式。因此，需要能够充分利用现代计算资源的快速、准确和可扩展的高/瘦 SVD 实现。为此，我们对使用 MPI 进行通信的此类高/瘦数据布局计算 SVD 的三种不同算法进行了调查。我们将这些与常见的大数据分析技术（主要是 PCA）结合起来。最后，我们展示了来自 Summit 超级计算机的 CPU 和 GPU 计时结果，并讨论了可能的替代方法。

更新日期：2020-09-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>