当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mining frequent itemsets from streaming transaction data using genetic algorithms
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-07-25 , DOI: 10.1186/s40537-020-00330-9
Sikha Bagui , Patrick Stanley

This paper presents a study of mining frequent itemsets from streaming data in the presence of concept drift. Streaming data, being volatile in nature, is particularly challenging to mine. An approach using genetic algorithms is presented, and various relationships between concept drift, sliding window size, and genetic algorithm constraints are explored. Concept drift is identified by changes in frequent itemsets. The novelty of this work lies in determining concept drift using frequent itemsets for mining streaming data, using the genetic algorithm framework. Formulas have been presented for calculating minimum support counts in streaming data using sliding windows. Testing highlighted that the ratio of the window size to transactions per drift was a key to good performance. Getting good results when the sliding window size was too small was a challenge since normal fluctuations in the data could appear to be a concept drift. Window size must be managed in conjunction with support and confidence values in order to achieve reasonable results. This method of detecting concept drift performed well when larger window sizes were used.

中文翻译:

使用遗传算法从流交易数据中挖掘频繁项集

本文提出了在概念漂移的情况下从流数据中挖掘频繁项集的研究。流数据本质上是易变的,对我来说尤其具有挑战性。提出了一种使用遗传算法的方法,并探讨了概念漂移,滑动窗口大小和遗传算法约束之间的各种关系。概念漂移通过频繁项目集的变化来识别。这项工作的新颖性在于使用遗传算法框架,使用频繁项集来确定概念漂移,以挖掘流数据。已经提出了使用滑动窗口来计算流数据中最小支持计数的公式。测试强调,窗口大小与每个漂移的事务之比是获得良好性能的关键。当滑动窗口的尺寸太小时,要获得良好的结果是一个挑战,因为数据的正常波动似乎是概念上的漂移。窗口大小必须与支持值和置信度值一起进行管理,以便获得合理的结果。当使用较大的窗口大小时,这种检测概念漂移的方法效果很好。
更新日期:2020-07-25
down
wechat
bug