Skip to main content
Log in

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High average-utility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the average-utility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77

    Google Scholar 

  2. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390. https://doi.org/10.1109/69.846291

    Article  Google Scholar 

  3. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  4. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering. IEEE, pp 3–14. https://doi.org/10.1109/icde.1995.380415

  5. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM Sigmod Record, vol 22, no. 2. ACM, pp 207–216. https://doi.org/10.1145/170035.170072

  6. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol 1215, pp 487–499

  7. Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1207

    Article  Google Scholar 

  8. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83

    Article  MathSciNet  Google Scholar 

  9. Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362. https://doi.org/10.1109/TKDE.2005.166

    Article  Google Scholar 

  10. Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining. IEEE, pp 441–448. https://doi.org/10.1109/ICDM.2001.989550

  11. Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003. IEEE, pp 19–26. https://doi.org/10.1109/ICDM.2003.1250893

  12. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp 482–486. https://doi.org/10.1137/1.9781611972740.51

  13. Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM, pp 90–99. https://doi.org/10.1145/1089827.1089839

  14. Liu Y, Liao WK, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: PAKDD, vol 3518, pp 689–695. https://doi.org/10.1007/11430919_79

  15. Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217. https://doi.org/10.1016/j.datak.2007.06.009

    Article  Google Scholar 

  16. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. https://doi.org/10.1109/TKDE.2009.46

    Article  Google Scholar 

  17. Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 253–262. https://doi.org/10.1145/1835804.1835839

  18. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 55–64. https://doi.org/10.1145/2396761.2396773

  19. Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems. Springer, Cham, pp 83–92. https://doi.org/10.1007/978-3-319-08326-1_9

  20. Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265. https://doi.org/10.1016/j.eswa.2011.01.006

    Article  Google Scholar 

  21. Lan GC, Hong TP, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209

    Google Scholar 

  22. Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: Asian Conference on Intelligent Information and Database Systems. Springer, Berlin, pp 131–139. https://doi.org/10.1007/978-3-642-12145-6_14

  23. Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243. https://doi.org/10.1016/j.aei.2016.04.002

    Article  Google Scholar 

  24. Lin JCW, Ren S, Fournier-Viger P, Hong TP (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940. https://doi.org/10.1109/ACCESS.2017.2717438

    Article  Google Scholar 

  25. Pei J, Han J, Lakshmanan LV (2004) Pushing convertible constraints in frequent itemset mining. Data Min Knowl Disc 8(3):227–252. https://doi.org/10.1023/B:DAMI.0000023674.74932.4c

    Article  MathSciNet  Google Scholar 

  26. Sethi KK, Ramesh D (2017) HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. J Supercomput 73(8):3652–3668. https://doi.org/10.1007/s11227-017-1963-4

    Article  Google Scholar 

  27. Pyun G, Yun U, Ryu KH (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl Based Syst 55:125–139. https://doi.org/10.1016/j.knosys.2013.10.013

    Article  Google Scholar 

  28. Yun U, Lee G, Ryu KH (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl Based Syst 55:49–65. https://doi.org/10.1016/j.knosys.2013.10.011

    Article  Google Scholar 

  29. Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424. https://doi.org/10.1016/j.eswa.2010.12.082

    Article  Google Scholar 

  30. Tseng VS, Shie BE, Wu CW, Philip SY (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59

    Article  Google Scholar 

  31. Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878. https://doi.org/10.1016/j.eswa.2013.11.038

    Article  Google Scholar 

  32. Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107. https://doi.org/10.1007/s10115-012-0492-y

    Article  Google Scholar 

  33. Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381. https://doi.org/10.1016/j.eswa.2014.11.001

    Article  Google Scholar 

  34. Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence. Springer, Cham, pp 530–546. https://doi.org/10.1007/978-3-319-27060-9_44

  35. Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183. https://doi.org/10.1016/j.eswa.2017.08.028

    Article  Google Scholar 

  36. Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehous Min (IJDWM) 10(1):1–15. https://doi.org/10.4018/ijdwm.2014010101

    Article  Google Scholar 

  37. Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intell 55:320–330. https://doi.org/10.1016/j.engappai.2016.07.006

    Article  Google Scholar 

  38. Fournier-Viger P, Lin JCW, Wu CW, Tseng VS, Faghihi U (2016) Mining minimal high-utility itemsets. In: International Conference on Database and Expert Systems Applications. Springer, Cham, pp 88–101. https://doi.org/10.1007/978-3-319-44403-1_6

  39. Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) FHM + : faster high-utility itemset mining using length upper-bound reduction. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp 115–127. https://doi.org/10.1007/978-3-319-42007-3_11

  40. Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(05):1009–1030. https://doi.org/10.1142/S0219622012500307

    Article  Google Scholar 

  41. Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. In: IFIP International Conference on Computer Information Systems and Industrial Management. Springer, Berlin, pp 33–42. https://doi.org/10.1142/S0219622012500307

  42. Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gener Comput Syst 68:346–360. https://doi.org/10.1016/j.future.2016.10.027

    Article  Google Scholar 

  43. Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346. https://doi.org/10.1007/s10489-017-0896-1

    Article  Google Scholar 

  44. Lin JCW, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609. https://doi.org/10.1109/ACCESS.2018.2801261

    Article  Google Scholar 

  45. Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6:18655–18669. https://doi.org/10.1109/ACCESS.2018.2820740

    Article  Google Scholar 

  46. Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314. https://doi.org/10.1109/TKDE.2018.2833478

    Article  Google Scholar 

  47. Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.07.018

    Article  Google Scholar 

  48. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393. https://doi.org/10.1007/978-3-319-46131-1_8

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research work is supported by the Indian Institute of Technology (ISM), Dhanbad, Govt. of India. The authors wish to express their gratitude and heartiest thanks to the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their research support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dharavath Ramesh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sethi, K.K., Ramesh, D. A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput 76, 10288–10318 (2020). https://doi.org/10.1007/s11227-020-03247-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03247-5

Keywords

Navigation