当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An empirical evaluation of random transformations applied to ensemble clustering
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2020-07-28 , DOI: 10.1007/s11042-020-08947-x
Gabriel Damasceno Rodrigues , Marcelo Keese Albertini , Xiaomin Yang

Ensemble clustering techniques have improved in recent years, offering better average performance between domains and data sets. Benefits range from finding novelty clustering which are unattainable by any single clustering algorithm to providing clustering stability, such that the quality is little affected by noise, outliers or sampling variations. The main clustering ensemble strategies are: to combine results of different clustering algorithms; to produce different results by resampling the data, such as in bagging and boosting techniques; and to execute a given algorithm multiple times with different parameters or initialization. Often ensemble techniques are developed for supervised settings and later adapted to the unsupervised setting. Recently, Blaser and Fryzlewicz proposed an ensemble technique to classification based on resampling and transforming input data. Specifically, they employed random rotations to improve significantly Random Forests performance. In this work, we have empirically studied the effects of random transformations based in rotation matrices, Mahalanobis distance and density proximity to improve ensemble clustering. Our experiments considered 12 data sets and 25 variations of random transformations, given a total of 5580 data sets applied to 8 algorithms and evaluated by 4 clustering measures. Statistical tests identified 17 random transformations that are viable to be applied to ensembles and standard clustering algorithms, which had positive effects on cluster quality. In our results, the best performing transforms were Mahalanobis-based transformations.



中文翻译:

应用于集成聚类的随机变换的经验评估

集成群集技术近年来有所改进,在域和数据集之间提供了更好的平均性能。收益的范围从发现任何单一聚类算法都无法实现的新颖性聚类到提供聚类稳定性,使得质量几乎不受噪声,离群值或采样变化的影响。主要的聚类集成策略是:组合不同聚类算法的结果;通过对数据进行重采样(例如在装袋和增强技术中)产生不同的结果;并使用不同的参数或初始化多次执行给定的算法。通常,合奏技术是针对有监督的环境开发的,后来又适应于无监督的环境。最近,Blaser和Fryzlewicz提出了一种基于重采样和转换输入数据的整体分类技术。具体来说,他们采用了随机轮换来显着改善“随机森林”的性能。在这项工作中,我们根据经验研究了基于旋转矩阵,马氏距离和密度接近度的随机变换的效果,以改善整体聚类。我们的实验考虑了12个数据集和25个随机变换变量,总共将5580个数据集应用于8种算法并通过4种聚类措施进行了评估。统计测试确定了17种随机变换,可以应用于集成和标准聚类算法,这对聚类质量产生了积极影响。在我们的结果中,性能最佳的转换是基于Mahalanobis的转换。

更新日期:2020-07-28
down
wechat
bug