PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining,Multimedia Systems

当前位置： X-MOL 学术 › Multimedia Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
Multimedia Systems ( IF 3.5 ) Pub Date : 2021-03-13 , DOI: 10.1007/s00530-020-00725-x
Mao Yimin , Geng Junhao , Deborah Simon Mwakapesa , Yaser Ahangari Nanehkaran , Zhang Chi , Deng Xiaoheng , Chen Zhigang

Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as \({\text{PFIMD}}\) algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called \({\text{DiffNodeset}}\) is adopted for avoiding the increase of \(N{-}list\) cardinality in the \({\text{MRPrePost}}\) algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the \({\text{DiffNodeset}}\) generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in \(F{-}list\), a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of \({\text{MRPrePost}}\) in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of \({\text{PFIMD}}\) algorithm in several multimedia data sets are listed to illustrate its universality.

中文翻译：

PFIMD：一种基于MapReduce的并行算法，用于频繁项集挖掘

频繁项集挖掘（FIM）是一项重要的数据挖掘技术，已广泛用于探索频繁项的众多应用中。随着数据集的快速增长和扩展，FIM已成为许多研究人员关注的话题，这引发了大数据环境中众多FIM算法的许多创新。本研究旨在设计一种基于MapReduce的优化并行频繁项集挖掘算法，称为\（{\ text {PFIMD}} \）算法，以解决处理和计算项集时的时间和空间复杂性问题因为在现有的并行FIM算法中未能充分平衡并行任务之间的负载。首先，称为\（{\ text {DiffNodeset}} \）的结构为避免有效地避免\（{\ text {MRPrePost}} \）算法中\（N {-} list \）基数的增加，采用了。然后，设计了一种2向比较策略，以加快2个项集的\（{\ text {DiffNodeset}} \）生成，并降低算法的时间复杂度。最后，使用云计算平台Hadoop和编程模型MapReduce并行化改进算法的步骤。此外，为了实现\（F {-} list \）中各项的统一分组，提出了一种基于动态分组的负载均衡策略，解决了集群中各节点负荷不均的问题。实验结果表明，改进的算法不仅克服了算法的缺点。\（{\ text {MRPrePost}} \）在大数据环境中，也大大减少了时间和空间的复杂性。最后，列出了\（{\ text {PFIMD}} \）算法在多个多媒体数据集中的特定应用，以说明其通用性。

更新日期：2021-03-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11