当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reuse-centric k-means configuration
Information Systems ( IF 3.0 ) Pub Date : 2021-04-16 , DOI: 10.1016/j.is.2021.101787
Lijun Zhang , Hui Guan , Yufei Ding , Xipeng Shen , Hamid Krim

K-means configuration is to find a configuration of k-means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k-means. This paper proposes reuse-centric k-means configuration to accelerate k-means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k-means–based data classification tasks show that reuse-centric k-means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential.



中文翻译:

以重用为中心 ķ-均值配置

ķ-means配置是找到一个配置 ķ-最大化某些目标的均值(例如,簇的数量,功能集)。由于ķ-方法。本文提出了以重用为中心的建议 ķ-表示配置加快ķ-表示配置。基于这种观察,对不同配置的探索共享许多相同或相似的计算。有效地重用先前对不同配置进行试验的计算结果可以大大缩短配置时间。为了实现这个想法,本文提出了一套新颖的技术,包括基于重用的过滤中心重用以及两阶段设计,以在三个级别上利用重用机会:验证,聚类数量和功能集。的实验ķ基于均值的数据分类任务表明,以重用为中心 ķ-means配置可以将基于启发式搜索的配置过程加快5.8倍,将分类错误面的基于搜索的统一速度提高9.1倍。同时,本文提供了一些有关如何有效地应用加速技术以挖掘全部潜力的重要见解。

更新日期:2021-04-28
down
wechat
bug