当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mining high utility itemsets using extended chain structure and utility machine
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-09-16 , DOI: 10.1016/j.knosys.2020.106457
Jun-Feng Qu , Philippe Fournier-Viger , Mengchi Liu , Bo Hang , Feng Wang

High utility itemsets are sets of items that have a high utility (e.g. a high profit or a high importance) in a transaction database. Discovering high utility itemsets has many important applications in real-life such as market basket analysis. Nonetheless, mining these patterns is a time-consuming process due to the huge search space and the high cost of utility computation. Most of previous work is devoted to search space pruning but pay little attention to utility computation. Factually, not only search space pruning but also high utility itemset identification have to resort to the computation of various utilities. This paper proposes a novel algorithm named REX (Rapid itEmset eXtraction), which extends the classic d2HUP algorithm with an improved structure, a k-item utility machine, and an efficient switch strategy. The structure can significantly reduce the time complexity of utility computation compared with the original structure used in d2HUP. The machine can quickly merge identical transactions and applies an efficient procedure for computing the utilities of extensions of a given itemset. The strategy derived from trial and error drastically gives rise to performance improvement on some databases and is also competitive with the switch strategy used in d2HUP on other databases. Experimental results show that REX achieves a speedup of from fifty percent to three orders of magnitude over d2HUP even though they use identical pruning techniques and that REX considerably outperforms state-of-the-art algorithms on real-life and synthetic databases.



中文翻译:

使用扩展链结构和实用程序挖掘高实用项集

高实用性项目集是在交易数据库中具有高实用性(例如,高利润或高重要性)的项目集。发现高实用性项目集在现实生活中有许多重要应用,例如市场篮子分析。但是,由于巨大的搜索空间和高昂的效用计算成本,挖掘这些模式是一个耗时的过程。以前的大多数工作都致力于搜索空间修剪,但很少关注效用计算。实际上,不仅搜索空间修剪而且高实用性项集标识都必须求助于各种实用性的计算。本文提出了一种新颖的算法,称为REX(Rapid itEmset eXtraction),它扩展了经典算法2具有改进结构的HUP算法 ķ项目实用程序和有效的切换策略。与d中使用的原始结构相比,该结构可以显着降低效用计算的时间复杂性2HUP。机器可以快速合并相同的交易,并应用有效的过程来计算给定项目集的扩展效用。通过反复试验得出的策略极大地提高了某些数据库的性能,并且与d中使用的切换策略具有竞争性。2在其他数据库上的HUP。实验结果表明,REX在d上实现了百分之五十到三个数量级的加速2即使HUP使用了相同的修剪技术,并且REX大大优于现实生活和综合数据库中的最新算法,但HUP仍然有效。

更新日期:2020-09-16
down
wechat
bug