Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data,New Generation Computing

当前位置： X-MOL 学术 › New Gener. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled Data
New Generation Computing ( IF 2.0 ) Pub Date : 2021-04-20 , DOI: 10.1007/s00354-021-00126-2
Peipei Li , Man Wu , Junhong He , Xuegang Hu

Data stream classification is widely popular in the field of network monitoring, sensor network and electronic commerce, etc. However, in the real-world applications, recurring concept drifting and label missing in data streams seriously aggravate the difficulty on the classification solutions. And this challenge has received little attention from the research community. Motivated by this, we propose a new ensemble classification approach based on the recurring concept drifting detection and model selection for data streams with unlabeled data. First, we build an ensemble model based on the classifiers and clusters. To improve the classification accuracy, we use the ensemble model to predict each data chunk and partition clusters according to the distribution of predicted class labels. Second, we adopt a new concept drifting detection method based on the divergence of concept distributions between adjoining data chunks to distinguish recurring concept drifts. All historical new concepts will be maintained. Meanwhile, we introduce the time-stamp-based weights for base models in the ensemble model. In the selection of the base model, we consider the time-stamp-based weight and the divergence between concept distributions simultaneously. Finally, extensive experiments conducted on four benchmark data sets show that our approach can quickly adapt to data streams with recurring concept drifts, and improve the classification accuracy compared to several state-of-the-art classification algorithms for data streams with concept drifts and unlabeled data.

中文翻译：

具有未标记数据的数据流的反复漂移检测和基于模型选择的集合分类

数据流分类在网络监控，传感器网络和电子商务等领域广泛流行。但是，在实际应用中，数据流中的重复概念漂移和标签丢失严重加剧了分类解决方案的难度。这个挑战几乎没有引起研究界的关注。因此，我们提出了一种新的集成分类方法，该方法基于循环概念漂移检测和具有未标记数据的数据流的模型选择。首先，我们基于分类器和聚类建立一个集成模型。为了提高分类的准确性，我们使用集合模型根据预测的类标签的分布来预测每个数据块和分区簇。第二，我们采用了一种新的概念漂移检测方法，该方法基于相邻数据块之间的概念分布差异来区分重复出现的概念漂移。所有历史性的新概念都将得到保留。同时，我们在集合模型中为基础模型引入基于时间戳的权重。在选择基本模型时，我们同时考虑基于时间戳的权重和概念分布之间的差异。最后，在四个基准数据集上进行的广泛实验表明，与几种针对概念漂移和未标记数据流的最新分类算法相比，我们的方法可以快速适应具有周期性概念漂移的数据流，并提高了分类准确性。数据。

更新日期：2021-04-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11