当前位置: X-MOL 学术Intell. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RobustRepStream: Robust stream clustering using self-controlled connectivity graph
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2020-07-15 , DOI: 10.3233/ida-194715
Ross Callister , Mihai Lazarescu , Duc-Son Pham

A major challenge in stream clustering is the evolution in the statistical properties of the underlying data. As clustering is inherently unsupervised, selecting suitable parameter values is often difficult. Clustering algorithms with sensitive parameters are often not robust to such changes, leading to poor clustering outputs. Algorithms using K-NN graphs face this problem, as they have a sensitive K-connectivity parameter which prohibits them from adapting to stream concept evolution. We address this by controlling the excess of the skewness of edge length distributions in the underlying K-NN graph by introducing novel skewness excess concept. We demonstrate the asymptotic linear dependency of skewness excess against the graph connectivity and propose the novel RobustRepStream algorithm, which extends the RepStream algorithm, and provides improved robustness against stream evolution. By automatically controlling the skewness excess, the user no longer needs to specify the K-connectivity parameter, and RobustRepStream can adjust the graph connectivity locally in order to achieve performance close to when the optimal K value is known. We demonstrate that RobustRepStream’s skewness threshold parameter is insensitive and universal across all data sets. We comprehensively evaluate RobustRepStream on real-world benchmark data sets against previous stream clustering algorithms, and demonstrate that it provides better clustering performance.

中文翻译:

RobustRepStream:使用自控制连接图的鲁棒流集群

流聚类中的主要挑战是基础数据的统计属性的发展。由于聚类本质上不受监督,因此选择合适的参数值通常很困难。具有敏感参数的聚类算法通常对这种变化不稳健,从而导致较差的聚类输出。使用K-NN图的算法面临此问题,因为它们具有敏感的K连通性参数,从而使其无法适应流概念的发展。我们通过引入新颖的偏斜度过剩概念来控制基本K-NN图中边缘长度分布的偏斜度来解决此问题。我们证明了偏度过多对图连通性的渐近线性依赖性,并提出了新颖的RobustRepStream算法,该算法扩展了RepStream算法,并提供了针对流演进的改进的鲁棒性。通过自动控制过大的偏斜度,用户不再需要指定K-connectivity参数,RobustRepStream可以在本地调整图形的连通性,从而在接近最佳K值时实现性能。我们证明了RobustRepStream的偏斜度阈值参数对所有数据集都是不敏感且通用的。我们根据以前的流聚类算法对真实基准数据集全面评估了RobustRepStream,并证明了它提供了更好的聚类性能。RobustRepStream可以在本地调整图的连通性,以达到与已知最佳K值接近的性能。我们证明了RobustRepStream的偏度阈值参数在所有数据集中都是不敏感的且通用的。我们根据以前的流聚类算法对真实基准数据集全面评估了RobustRepStream,并证明了它提供了更好的聚类性能。RobustRepStream可以在本地调整图的连通性,以达到与已知最佳K值接近的性能。我们证明了RobustRepStream的偏斜度阈值参数对所有数据集都是不敏感且通用的。我们根据以前的流聚类算法对真实基准数据集全面评估了RobustRepStream,并证明了它提供了更好的聚类性能。
更新日期:2020-07-22
down
wechat
bug