当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-06-04 , DOI: 10.1016/j.jpdc.2020.05.017
Manjunath K Vanahalli , Nagamma Patil

The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining a massive set of short and average sized itemsets. The greater interest of research in the field of bioinformatics and the abundant data across the variety of domains paved the way for the generation of the high dimensional dataset. These datasets are depicted by an extensive number of features and a smaller number of rows. Colossal closed itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. Extracting a huge amount of information and knowledge from the high dimensional dataset is a nontrivial task. The existing colossal closed itemsets mining algorithms for the high dimensional dataset are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. Balanced Distributed Parallel Frequent Colossal Closed Itemset Mining (BDPFCCIM) algorithm is designed for high dimensional datasets. An efficient closeness checking method to check the closeness of the rowset and an efficient pruning strategy to snip the row enumeration mining search space is enclosed with the proposed BDPFCCIM algorithm. The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets. The experimental results demonstrate the efficient performance of the proposed BDPFCCIM algorithm in comparison with the state-of-the-art algorithms.



中文翻译:

高维数据集的分布式负载平衡频繁巨大封闭项集挖掘算法。

近年来,从高维生物学数据集中提取巨大封闭项目集的重点很大。大量的短而平均大小的采矿项目集并不能限制用于决策的完整而有价值的信息。但是,传统的项目集挖掘算法在挖掘大量短而平均大小的项目集时会花费大量时间。对生物信息学领域的研究兴趣越来越大,跨领域的大量数据为高维数据集的产生铺平了道路。这些数据集由大量要素和较少数量的行表示。巨大的封闭项集对于包括生物信息学领域在内的众多应用非常重要,并且在决策过程中具有影响力。从高维数据集中提取大量信息和知识是一项艰巨的任务。用于高维数据集的现有巨大封闭项目集挖掘算法是连续的,并且计算量很大。分布式并行计算是克服现有顺序算法效率低下的好策略。平衡分布式并行频繁频繁闭项集挖掘(BDPFCCIM)算法设计用于高维数据集。提出的BDPFCCIM算法包含了一种有效的用于检查行集的紧密性的检查方法和一种用于修剪行枚举挖掘搜索空间的有效修剪策略。提出的BDPFCCIM算法是从高维生物学数据集中挖掘频繁的巨大封闭项目集的第一个分布式负载平衡算法。实验结果表明,与最新算法相比,所提出的BDPFCCIM算法具有高效的性能。

更新日期:2020-06-04
down
wechat
bug