当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2020-07-31 , DOI: 10.1007/s10115-020-01485-w
Mohammad Karim Sohrabi

High utility itemset mining is an important extension of frequent itemset mining which considers unit profits and quantities of items as external and internal utilities, respectively. Since the utility function has not downward closure property, an overestimated value of utility is obtained using an anti-monotonic upper bound of utility function to prune the search space and improve the efficiency of high utility itemset mining methods. Transaction-weighted utilization (TWU) of itemset was the first and one of the most important functions which has been used as the anti-monotonic upper bound of utility by various algorithms. A variety of high utility itemset mining methods have attempted to tighten the utility upper bound and have exploited appropriate pruning strategies to improve mining efficiency. Although TWU and its improved alternatives have attempted to increase the efficiency of high utility itemset mining methods by pruning their search spaces, they suffer from a significant number of generated candidates which are high-TWU but are not high utility itemsets. Calculating the actual utilities of low utility candidates needs to multiple scanning of the dataset and thus imposes a huge overhead to the mining methods, which can cause to lose the pruning benefits of the upper bounds. Proposing appropriate pruning strategies, exploiting efficient data structures, and using tight anti-monotonic upper bounds can overcome this problem and lead to significant performance improvement in high utility itemset mining methods. In this paper, a new projection-based method, called MAHI (matrix-aided high utility itemset mining), is introduced which uses a novel utility matrix-based pruning strategy, called MA-prune to improve the high utility itemset mining performance in terms of execution time. The experimental results show that MAHI is faster than former algorithms.



中文翻译:

基于效用矩阵的新型修剪方法,一种基于投影的高效实用项集挖掘方法

高实用性项集挖掘是频繁项集挖掘的重要扩展,频繁项集挖掘将单位利润和项的数量分别视为外部和内部实用程序。由于效用函数不具有向下封闭性,因此使用效用函数的反单调上界来缩小搜索空间并提高高效项集挖掘方法的效率会获得高估的效用值。项集的事务加权利用率(TWU)是第一个也是最重要的功能之一,已被各种算法用作效用的反单调上限。各种高实用性项集挖掘方法已尝试收紧实用程序上限,并已采用适当的修剪策略来提高挖掘效率。尽管TWU及其改进的替代方案已尝试通过修剪搜索空间来提高高实用性项目集挖掘方法的效率,但它们却遭受了大量生成的候选对象,它们都是高TWU而不是高实用性项目集。计算低效候选者的实际效用需要对数据集进行多次扫描,因此对挖掘方法造成了巨大的开销,这可能会导致失去上限的修剪优势。提出适当的修剪策略,利用有效的数据结构并使用严格的反单调上限可以克服此问题,并在高实用项集挖掘方法中带来显着的性能改进。本文提出了一种新的基于投影的方法,称为MAHI(矩阵辅助的高实用项集挖掘),本文介绍了一种使用基于实用程序矩阵的新颖修剪策略(称为MA-prune)来提高高实用项集挖掘性能的执行时间的方法。实验结果表明,MAHI比以前的算法更快。

更新日期:2020-07-31
down
wechat
bug