Subspace approximation with outliers,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Subspace approximation with outliers
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-06-30 , DOI: arxiv-2006.16573
Amit Deshpande and Rameshwar Pratap

The subspace approximation problem with outliers, for given $n$ points in $d$ dimensions $x_{1},\ldots, x_{n} \in R^{d}$, an integer $1 \leq k \leq d$, and an outlier parameter $0 \leq \alpha \leq 1$, is to find a $k$-dimensional linear subspace of $R^{d}$ that minimizes the sum of squared distances to its nearest $(1-\alpha)n$ points. More generally, the $\ell_{p}$ subspace approximation problem with outliers minimizes the sum of $p$-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the $(1-\alpha)n$ inliers in the optimal solution are promised to lie exactly on a $k$-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal $k$-dimensional subspace summed over the optimal $(1-\alpha)n$ inliers is at least $\delta$ times its squared-error summed over all $n$ points, for some $0 < \delta \leq 1 - \alpha$. With this assumption, we give an efficient algorithm to find a subset of $poly(k/\epsilon) \log(1/\delta) \log\log(1/\delta)$ points whose span contains a $k$-dimensional subspace that gives a multiplicative $(1+\epsilon)$-approximation to the optimal solution. The running time of our algorithm is linear in $n$ and $d$. Interestingly, our results hold even when the fraction of outliers $\alpha$ is large, as long as the obvious condition $0 < \delta \leq 1 - \alpha$ is satisfied.

中文翻译：

带有异常值的子空间近似

具有离群值的子空间近似问题，对于给定的 $n$ 点在 $d$ 维度 $x_{1},\ldots, x_{n} \in R^{d}$，整数 $1 \leq k \leq d$ ，和一个离群参数 $0 \leq \alpha \leq 1$，是为了找到 $R^{d}$ 的 $k$ 维线性子空间，它最小化到它最近的 $(1-\alpha )n$ 点。更一般地，带有异常值的 $\ell_{p}$ 子空间近似问题最小化距离的 $p$-th 次幂之和，而不是距离平方和。即使是鲁棒 PCA 的情况也很重要，以前的工作需要对输入进行额外的假设。任何具有异常值的子空间逼近问题的乘法逼近算法都必须解决鲁棒子空间恢复问题，一种特殊情况，其中最优解中的 $(1-\alpha)n$ 内点被承诺恰好位于 $k$ 维线性子空间上。然而，鲁棒的子空间恢复是小集扩展 (SSE) 困难的。我们展示了如何将基于采样的降维技术和双准则近似扩展到具有异常值的子空间近似问题。为了绕过鲁棒子空间恢复的 SSE 硬度，我们假设最优 $k$ 维子空间的平方距离误差在最优 $(1-\alpha)n$ 内点上求和至少是 $\delta$ 倍对于某些 $0 < \delta \leq 1 - \alpha$，它的平方误差在所有 $n$ 点上求和。有了这个假设，我们给出了一个有效的算法来找到 $poly(k/\epsilon) \log(1/\delta) \log\log(1/\delta)$ 点的子集，其跨度包含 $k$ 维子空间，给出乘法 $(1+\epsilon)$-近似最优解。我们算法的运行时间在 $n$ 和 $d$ 中是线性的。有趣的是，即使当异常值 $\alpha$ 的比例很大时，只要满足明显的条件 $0 < \delta \leq 1 - \alpha$，我们的结果仍然成立。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>