当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Kernel conditional clustering and kernel conditional semi-supervised learning
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2019-06-06 , DOI: 10.1007/s10115-019-01334-5
Xiao He , Thomas Gumbsch , Damian Roqueiro , Karsten Borgwardt

The results of clustering are often affected by covariates that are independent of the clusters one would like to discover. Traditionally, alternative clustering algorithms can be used to solve such clustering problems. However, these suffer from at least one of the following problems: (1) Continuous covariates or nonlinearly separable clusters cannot be handled; (2) assumptions are made about the distribution of the data; (3) one or more hyper-parameters need to be set. The presence of covariates also has an effect in a different type of problem such as semi-supervised learning. To the best of our knowledge, there is no existing method addressing the semi-supervised learning setting in the presence of covariates. Here we propose two novel algorithms, named kernel conditional clustering (KCC) and kernel conditional semi-supervised learning (KCSSL), whose objectives are derived from a kernel-based conditional dependence measure. KCC is parameter-light and makes no assumptions about the cluster structure, the covariates, or the distribution of the data, while KCSSL is fully parameter-free. On both simulated and real-world datasets, the proposed KCC and KCSSL algorithms perform better than state-of-the-art methods. The former detects the ground truth cluster structures more accurately, and the latter makes more accurate predictions.

中文翻译:

内核条件聚类和内核条件半监督学习

聚类结果通常受协变量影响,而协变量与人们希望发现的聚类无关。传统上,替代集群可以使用算法来解决此类聚类问题。但是,这些问题至少具有以下问题之一:(1)无法处理连续协变量或非线性可分离的聚类;(2)对数据的分布进行假设;(3)需要设置一个或多个超参数。协变量的存在还对另一类问题(例如半监督学习)产生影响。据我们所知,在存在协变量的情况下,没有解决半监督学习设置的现有方法。在这里,我们提出了两种新颖的算法,称为内核条件聚类(KCC)和内核条件半监督学习(KCSSL),其目标是从基于内核的条件相关性度量中得出的。KCC的参数很轻,并且不对集群结构进行任何假设,协变量或数据分布,而KCSSL完全没有参数。在模拟和真实数据集上,建议的KCC和KCSSL算法的性能均优于最新方法。前者可以更准确地检测地面真相簇结构,而后者可以进行更准确的预测。
更新日期:2019-06-06
down
wechat
bug