当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2021-09-14 , DOI: 10.1007/s10618-021-00786-0
Alessio Bernardo 1 , Emanuele Della Valle 1
Affiliation  

The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline-Smote inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.



中文翻译:

VFC-SMOTE:用于演化数据流的非常快速的连续合成少数过采样

世界在不断变化,产生的海量数据也在不断变化。然而,只有少数研究涉及在线类不平衡学习,它结合了类不平衡数据流和概念漂移的挑战。在本文中,我们提出了非常快速的连续合成少数过采样技术(VFC - SMOTE)。它是一种新型的元策略,预先考虑到任何流机器学习分类算法,旨在使用新版本的过采样少数类击打临界-击杀启发数据草图。我们对VFC - SMOTE 进行了基准测试包含不同概念漂移、不平衡水平和类别分布的合成和真实数据流上的管道。我们带来了统计证据,表明VFC - SMOTE管道学习的模型的少数类性能优于最新技术。此外,我们分析了时间/内存消耗和概念漂移恢复速度。

更新日期:2021-09-15
down
wechat
bug