当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discriminative clustering with representation learning with any ratio of labeled to unlabeled data
Statistics and Computing ( IF 2.2 ) Pub Date : 2022-01-29 , DOI: 10.1007/s11222-021-10067-x
Corinne Jones 1 , Vincent Roulet 2 , Zaid Harchaoui 2
Affiliation  

We present a discriminative clustering approach in which the feature representation can be learned from data and moreover leverage labeled data. Representation learning can give a similarity-based clustering method the ability to automatically adapt to an underlying, yet hidden, geometric structure of the data. The proposed approach augments the DIFFRAC method with a representation learning capability, using a gradient-based stochastic training algorithm and an optimal transport algorithm with entropic regularization to perform the cluster assignment step. The resulting method is evaluated on several real datasets when varying the ratio of labeled data to unlabeled data and thereby interpolating between the fully unsupervised regime and the fully supervised regime. The experimental results suggest that the proposed method can learn powerful feature representations even in the fully unsupervised regime and can leverage even small amounts of labeled data to improve the feature representations and to obtain better clusterings of complex datasets.



中文翻译:

具有任何标记数据与未标记数据比率的表示学习的判别聚类

我们提出了一种判别聚类方法,其中可以从数据中学习特征表示,此外还可以利用标记数据。表示学习可以使基于相似性的聚类方法能够自动适应数据的底层但隐藏的几何结构。所提出的方法通过表示学习能力增强了 DIFFRAC 方法,使用基于梯度的随机训练算法和具有熵正则化的最优传输算法来执行集群分配步骤。当改变标记数据与未标记数据的比率并因此在完全无监督制度和完全监督制度之间进行插值时,在几个真实数据集上评估所得方法。

更新日期:2022-01-29
down
wechat
bug