Learning in the presence of concept recurrence in data stream clustering,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning in the presence of concept recurrence in data stream clustering
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-09-15 , DOI: 10.1186/s40537-020-00354-1
K. Namitha , G. Santhosh Kumar

In the case of real-world data streams, the underlying data distribution will not be static; it is subject to variation over time, which is known as the primary reason for concept drift. Concept drift poses severe problems to the accuracy of a model in online learning scenarios. The recurring concept is a particular case of concept drift where the concepts already seen in the past reappear as the stream evolves. This problem is not yet studied in the context of stream clustering. This paper proposes a novel algorithm for identifying the recurring concepts in data stream clustering. During concept recurrence, the most matching model is retrieved from the repository and reused. The algorithm has minimum memory requirements and works online with the stream. Some of the concepts and definitions, already familiar in concept recurrence studies of stream classification have been redefined for clustering. The experiments conducted on real and synthetic data streams reveal that the proposed algorithm has the potential to identify recurring concepts.

中文翻译：

在数据流聚类中存在概念重复的情况下进行学习

在实际数据流的情况下，基础数据分布将不是静态的。它会随时间变化，这被称为概念漂移的主要原因。在在线学习场景中，概念漂移给模型的准确性带来了严重的问题。重复出现的概念是概念漂移的一种特殊情况，随着流的发展，过去已经看到的概念会重新出现。在流聚类的上下文中尚未研究此问题。本文提出了一种新颖的算法，用于识别数据流聚类中的重复概念。在概念重复执行期间，从存储库中检索最匹配的模型并重新使用。该算法具有最低的内存要求，并且可以与流联机工作。一些概念和定义，已经对流分类的概念重复研究进行了重新定义，以进行聚类。在真实和合成数据流上进行的实验表明，该算法具有识别重复出现的概念的潜力。

更新日期：2020-09-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文