当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-08 , DOI: 10.1016/j.patcog.2021.107977
Ye Zhu , Kai Ming Ting , Mark J. Carman , Maia Angelova

The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the same density. As a result, they often exhibit a bias towards dense clusters in the presence of sparse clusters. Many remedies have been suggested; yet, we show that they are partial solutions which do not address the issue satisfactorily. To match the implicit assumption, we propose to transform a given dataset such that the transformed clusters have approximately the same density while all regions of locally low density become globally low density—homogenising cluster density while preserving the cluster structure of the dataset. We show that this can be achieved by using a new multi-dimensional Cumulative Distribution Function in a transform-and-shift method. The method can be applied to every dataset, before the dataset is used in many existing algorithms to match their implicit assumption without algorithmic modification. We show that the proposed method performs better than existing remedies.



中文翻译:

CDF转换和移位:处理非均匀簇密度数据集的有效方法

对于聚类和异常检测中基于距离和基于密度的算法,不均匀的簇密度问题一直是一个长期存在的问题。这些算法隐式地假设所有群集具有近似相同的密度。结果,在稀疏簇存在的情况下,它们通常表现出对密集簇的偏向。已经提出了许多补救措施。但是,我们表明它们是不能令人满意地解决该问题的部分解决方案。为了匹配隐式假设,我们建议对给定的数据集进行转换,以使转换后的簇具有大致相同的密度,而局部低密度的所有区域都变为全局低密度-在保持数据集的聚类结构的同时,使聚类密度均匀化。我们表明,这可以通过在变换和移位方法中使用新的多维累积分布函数来实现。该方法可以应用于每个数据集,然后在许多现有算法中使用该数据集以匹配其隐式假设而无需进行算法修改。我们表明,所提出的方法比现有的方法具有更好的效果。

更新日期:2021-04-19
down
wechat
bug