Online reliable semi-supervised learning on evolving data streams,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online reliable semi-supervised learning on evolving data streams
Information Sciences Pub Date : 2020-03-24 , DOI: 10.1016/j.ins.2020.03.052
Salah Ud Din , Junming Shao , Jay Kumar , Waqar Ali , Jiaming Liu , Yu Ye

In todays digital era, a massive amount of streaming data is automatically and continuously generated. To learn such data streams, many algorithms have been proposed during the last decade. Due to the dynamic nature of streaming data, the learning algorithms must be adaptive to handle concept drift and work under limited memory and time. Currently, most existing works assume that the true class labels of all incoming instances are immediately available. In real-world applications, labeling every data item in data streams is time and resource consuming. A more realistic situation is that only a few instances in data streams are labeled. Thereby, how to design a new efficient and effective learning algorithm that can handle concept drift, label scarcity, and work under limited resources is of significant importance. In this paper, we propose a new online semi-supervised learning algorithm by modeling concept drifts with a set of micro-clusters. These micro-clusters are dynamically maintained to capture the evolving concepts with error-based representative learning. In this way, local concept drifts are captured more quickly and finally support effective data stream learning. Extensive experiments on several data sets demonstrate that our learning model allows yielding high classification performance compared to many state-of-the-art algorithms.

中文翻译：

不断发展的数据流在线可靠的半监督学习

在当今的数字时代，自动连续地生成大量流数据。为了学习这样的数据流，在过去的十年中已经提出了许多算法。由于流数据的动态性质，学习算法必须适应于处理概念漂移并在有限的内存和时间下工作。当前，大多数现有作品都假定所有传入实例的真实类标签均立即可用。在实际的应用程序中，标记数据流中的每个数据项都是很耗时间和资源的。更为现实的情况是，仅标记了数据流中的少数实例。因此，如何设计一种能够处理概念漂移，标签稀缺以及在有限资源下工作的高效学习算法非常重要。在本文中，我们通过使用一组微型集群对概念漂移建模来提出一种新的在线半监督学习算法。通过基于错误的代表学习，动态维护这些微型集群以捕获不断发展的概念。这样，可以更快地捕获本地概念漂移，并最终支持有效的数据流学习。在多个数据集上进行的广泛实验表明，与许多最新算法相比，我们的学习模型可以实现较高的分类性能。

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11