当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Complexity of Sparse Tensor PCA
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-06-11 , DOI: arxiv-2106.06308
Davin Choo, Tommaso d'Orsi

We study the problem of sparse tensor principal component analysis: given a tensor $\pmb Y = \pmb W + \lambda x^{\otimes p}$ with $\pmb W \in \otimes^p\mathbb{R}^n$ having i.i.d. Gaussian entries, the goal is to recover the $k$-sparse unit vector $x \in \mathbb{R}^n$. The model captures both sparse PCA (in its Wigner form) and tensor PCA. For the highly sparse regime of $k \leq \sqrt{n}$, we present a family of algorithms that smoothly interpolates between a simple polynomial-time algorithm and the exponential-time exhaustive search algorithm. For any $1 \leq t \leq k$, our algorithms recovers the sparse vector for signal-to-noise ratio $\lambda \geq \tilde{\mathcal{O}} (\sqrt{t} \cdot (k/t)^{p/2})$ in time $\tilde{\mathcal{O}}(n^{p+t})$, capturing the state-of-the-art guarantees for the matrix settings (in both the polynomial-time and sub-exponential time regimes). Our results naturally extend to the case of $r$ distinct $k$-sparse signals with disjoint supports, with guarantees that are independent of the number of spikes. Even in the restricted case of sparse PCA, known algorithms only recover the sparse vectors for $\lambda \geq \tilde{\mathcal{O}}(k \cdot r)$ while our algorithms require $\lambda \geq \tilde{\mathcal{O}}(k)$. Finally, by analyzing the low-degree likelihood ratio, we complement these algorithmic results with rigorous evidence illustrating the trade-offs between signal-to-noise ratio and running time. This lower bound captures the known lower bounds for both sparse PCA and tensor PCA. In this general model, we observe a more intricate three-way trade-off between the number of samples $n$, the sparsity $k$, and the tensor power $p$.

中文翻译:

稀疏张量 PCA 的复杂性

我们研究稀疏张量主成分分析的问题:给定一个张量 $\pmb Y = \pmb W + \lambda x^{\otimes p}$ 与 $\pmb W \in \otimes^p\mathbb{R}^ n$ 具有 iid 高斯项,目标是恢复 $k$-稀疏单位向量 $x \in \mathbb{R}^n$。该模型同时捕获稀疏 PCA(以其 Wigner 形式)和张量 PCA。对于 $k \leq \sqrt{n}$ 的高度稀疏机制,我们提出了一系列算法,可以在简单的多项式时间算法和指数时间穷举搜索算法之间进行平滑插值。对于任何 $1 \leq t \leq k$,我们的算法会恢复信噪比的稀疏向量 $\lambda \geq \tilde{\mathcal{O}} (\sqrt{t} \cdot (k/t )^{p/2})$ 时间 $\tilde{\mathcal{O}}(n^{p+t})$, 捕获矩阵设置的最先进保证(在多项式时间和次指数时间范围内)。我们的结果自然扩展到具有不相交支持的 $r$ 不同 $k$-sparse 信号的情况,保证与尖峰的数量无关。即使在稀疏 PCA 的受限情况下,已知算法也只能恢复 $\lambda \geq \tilde{\mathcal{O}}(k \cdot r)$ 的稀疏向量,而我们的算法需要 $\lambda \geq \tilde{ \mathcal{O}}(k)$。最后,通过分析低度似然比,我们用严格的证据补充这些算法结果,说明信噪比和运行时间之间的权衡。该下限捕获了稀疏 PCA 和张量 PCA 的已知下限。在这个通用模型中,
更新日期:2021-06-14
down
wechat
bug