当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2020-06-22 , DOI: 10.1145/3385652
Dominik Mautz 1 , Wei Ye 2 , Claudia Plant 3 , Christian Böhm 1
Affiliation  

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k -means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call N r -K means (for non-redundant k -means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of N r -K means that harnesses Hartigan’s dip test to identify the number of clusters for each subspace automatically.

中文翻译:

具有 Nr-Kmeans 和 Nr-DipMeans 的非冗余子空间聚类

高维空间中的巨大对象集合通常可以通过多种方式进行聚类,例如,对象可以按其形状或颜色进行聚类。每个分组代表数据集的不同视图。新的研究领域非冗余聚类解决了这类问题。在本文中,我们遵循不同的、非冗余的方法ķ-means-like 聚类可能存在于高维空间的不同的、任意方向的子空间中。我们假设这些子空间(以及可选的进一步噪音空间没有任何簇结构)彼此正交。这个假设能够对非冗余聚类问题进行特别严格的数学处理,因此是一种特别有效的算法,我们称之为 Nr-K方法(对于非冗余ķ-方法)。我们算法的优越性在理论上和大量实验中都得到了证明。此外,我们建议扩展 Nr-K方法它利用 Hartigan 的 dip 测试自动识别每个子空间的集群数量。
更新日期:2020-06-22
down
wechat
bug