当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast Large-Scale Spectral Clustering via Explicit Feature Mapping
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2019-03-01 , DOI: 10.1109/tcyb.2018.2794998
Li He , Nilanjan Ray , Yisheng Guan , Hong Zhang

We propose an efficient spectral clustering method for large-scale data. The main idea in our method consists of employing random Fourier features to explicitly represent data in kernel space. The complexity of spectral clustering thus is shown lower than existing Nyström approximations on large-scale data. With ${m}$ training points from a total of ${n}$ data points, Nyström method requires ${O(nmd+m^{3}+nm^{2})}$ operations, where ${d}$ is the input dimension. In contrast, our proposed method requires ${O(nDd+D^{3}+n'D^{2})}$ , where ${n}'$ is the number of data points needed until convergence and ${D}$ is the kernel mapped dimension. In large-scale datasets where ${n' \ll n}$ hold true, our explicitly mapping method can significantly speed up eigenvector approximation and benefit prediction speed in spectral clustering. For instance, on MNIST (60 000 data points), the proposed method is similar in clustering accuracy to Nyström methods while its speed is twice as fast as Nyström.

中文翻译:

通过显式特征映射的快速大规模光谱聚类

我们为大型数据提出了一种有效的频谱聚类方法。我们方法的主要思想是采用随机傅立叶特征来显式表示内核空间中的数据。因此,频谱聚类的复杂性显示为比大规模数据上现有的Nyström近似更低。和 $ {m} $ 总共的训练点 $ {n} $ 数据点,Nyström方法要求 $ {O(nmd + m ^ {3} + nm ^ {2})} $ 操作,在哪里 $ {d} $ 是输入尺寸。相反,我们提出的方法要求 $ {O(nDd + D ^ {3} + n'D ^ {2})} $ , 在哪里 $ {n}'$ 是收敛之前所需的数据点数, $ {D} $ 是内核映射的维。在大规模数据集中, $ {n'\ ll n} $ 诚然,我们的显式映射方法可以显着加快特征向量逼近速度,并在谱聚类中加快预测速度。例如,在MNIST(60 000个数据点)上,所提出的方法在聚类精度方面与Nyström方法相似,而其速度却是Nyström的两倍。
更新日期:2019-03-01
down
wechat
bug