当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Anytime Frequent Itemset Mining of Transactional Data Streams
Big Data Research ( IF 3.5 ) Pub Date : 2020-07-28 , DOI: 10.1016/j.bdr.2020.100146
Poonam Goyal , Jagat Sesh Challa , Shivin Shrivastava , Navneet Goyal

Mining frequent itemsets from transactional data streams has become very essential in today's world with many applications such as stock market analysis, retail chain analysis, web log analysis, etc. Various algorithms have been proposed to efficiently mine single-port and multi-port transactional streams within the constraints of limited time and memory. However, all of them are budget algorithms, i.e., they are not capable of handling varying inter-arrival rate of transactions and high speed streams. They are constrained by a maximum limit to the inter-arrival rate of transactions, beyond which they fail to process. Also, these algorithms are not capable of giving immediate mining results, even with compromised accuracy if required. The above two properties characterize an anytime algorithm. We propose AnyFI, which is the first anytime frequent itemset mining algorithm for data streams. AnyFI uses a novel data structure - BFI-forest, which is capable of handling transactions arriving at variable rate. It maintains itemsets in BFI-forest in such a way that it can give a mining result almost immediately when the time allowance to mine is very less and can refine its accuracy with increase in time allowance. We also propose MPAnyFI which extends AnyFI into a parallel framework for anytime frequent itemset mining of multi-port data streams over commodity clusters. It uses AnyFI at each computing node of the cluster. Our extensive experimental analysis shows that AnyFI can handle high stream speeds close to 60,000 trans/sec with recall close to 100%. They also show the efficiency of MPAnyFI.



中文翻译:

随时进行交易数据流的频繁项集挖掘

在当今世界,通过许多应用程序(例如股票市场分析,零售链分析,Web日志分析等),从交易数据流中挖掘频繁项集已经变得非常重要。已提出了各种算法来有效地挖掘单端口和多端口交易流在有限的时间和记忆的限制内。但是,它们都是预算算法,即它们不能处理事务和高速流的变化的到达间隔率。它们受到交易到达率的最大限制的约束,超过该限制将无法处理。而且,即使需要,这些算法也无法给出即时的挖掘结果,即使准确性受损。以上两个属性表示随时算法。我们建议AnyFI这是第一个随时随地频繁进行项挖掘的数据流算法AnyFI使用一种新颖的数据结构-BFI -forest,它能够处理以可变速率到达的交易。它以一种方式维护BFI森林中的项目集,从而可以在采矿时间量非常少的情况下几乎立即给出采矿结果,并且可以随着时间量的增加而提高其准确性。我们还建议使用MPAnyFI,它将AnyFI扩展到并行框架中,以便随时进行商品集群上多端口数据流的频繁项集挖掘。它在群集的每个计算节点上使用AnyFI。我们广泛的实验分析表明,AnyFI可以处理接近60,000 trans / sec的高流速度,召回率接近100%。它们还显示了MPAnyFI的效率。

更新日期:2020-07-28
down
wechat
bug