当前位置: X-MOL 学术Multiscale Modeling Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compression, Inversion, and Approximate PCA of Dense Kernel Matrices at Near-Linear Computational Complexity
Multiscale Modeling and Simulation ( IF 1.6 ) Pub Date : 2021-04-15 , DOI: 10.1137/19m129526x
Florian Schäfer , T. J. Sullivan , Houman Owhadi

Multiscale Modeling &Simulation, Volume 19, Issue 2, Page 688-730, January 2021.
Dense kernel matrices $\Theta \in \mathbb{R}^{N \times N}$ obtained from point evaluations of a covariance function $G$ at locations $\{ x_{i} \}_{1 \leq i \leq N} \subset \mathbb{R}^{d}$ arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously distributed sampling points, we show how to identify a subset $S \subset \{ 1 , \dots , N \}^2$, with $\# S = \mathcal{O} ( N \log (N) \log^{d} ( N /\epsilon ) )$, such that the zero fill-in incomplete Cholesky factorization of the sparse matrix $\Theta_{ij} \mathbf{1}_{( i, j ) \in S}$ is an $\epsilon$-approximation of $\Theta$. This factorization can provably be obtained in complexity $\mathcal{O} ( N \log( N ) \log^{d}( N /\epsilon) )$ in space and $\mathcal{O} ( N \log^{2}( N ) \log^{2d}( N /\epsilon) )$ in time, improving upon the state of the art for general elliptic operators; we further present numerical evidence that $d$ can be taken to be the intrinsic dimension of the data set rather than that of the ambient space. The algorithm only needs to know the spatial configuration of the $x_{i}$ and does not require an analytic representation of $G$. Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm. Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion, and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity $\mathcal{O} ( N \log^{d}( N /\epsilon) )$ in space and $\mathcal{O} ( N \log^{2d}( N /\epsilon) )$ in time, improving upon the state of the art for general elliptic operators.


中文翻译:

近线性计算复杂度下密集核矩阵的压缩、反演和近似 PCA

多尺度建模与仿真,第 19 卷,第 2 期,第 688-730 页,2021 年 1 月。
密集核矩阵 $\Theta \in \mathbb{R}^{N \times N}$ 从协方差函数 $G$ 在位置 $\{ x_{i} \}_{1 \leq i \ 的点评估中获得leq N} \subset \mathbb{R}^{d}$ 出现在统计学、机器学习和数值分析中。对于作为椭圆边界值问题和均匀分布采样点的格林函数的协方差函数,我们展示了如何识别子集 $S \subset \{ 1 , \dots , N \}^2$,其中 $\# S = \ mathcal{O} ( N \log (N) \log^{d} ( N /\epsilon ) )$,使得稀疏矩阵 $\Theta_{ij} \mathbf{1 的零填充不完全 Cholesky 分解}_{( i, j ) \in S}$ 是 $\Theta$ 的 $\epsilon$ 近似值。可以证明,这种分解可以在空间复杂度 $\mathcal{O} ( N \log( N ) \log^{d}( N /\epsilon) )$ 和 $\mathcal{O} ( N \log^{ 2}( N ) \log^{2d}( N /\epsilon) )$ ,改进了一般椭圆算子的最新技术;我们进一步提供了数值证据,表明 $d$ 可以被视为数据集的内在维度,而不是环境空间的内在维度。该算法只需要知道 $x_{i}$ 的空间配置,不需要 $G$ 的解析表示。此外,这种分解直接提供了一个近似稀疏 PCA,在算子范数中具有最佳收敛速度。因此,通过仅使用子采样和不完全 Cholesky 分解,我们以近乎线性的复杂度获得压缩、反演、和一大类协方差矩阵的近似 PCA。通过反转 Cholesky 分解的顺序,我们还获得了椭圆 PDE 的求解器,其复杂度为 $\mathcal{O} ( N \log^{d}( N /\epsilon) )$ 和 $\mathcal{O} ( N \log^{2d}( N /\epsilon) )$ 及时改进了一般椭圆算子的最新技术。
更新日期:2021-04-15
down
wechat
bug