Skip to main content
Log in

High Occupancy Itemset Mining with Consideration of Transaction Occupancy

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Discovering high occupancy itemsets is an interesting area of research in data mining. Occupancy computation in traditional approaches is restricted to the occupied portions of the itemsets in the supporting transactions only. It can’t distinguish between the occupancies of the same itemset in different supporting transactions of equal lengths. If itemset size is equal to the transaction length, occupancy becomes highest. The fact promotes the generation of undesirable itemsets especially the isolated ones. Furthermore, average occupancies of the itemsets having equal size become equal though they appear in different transactions of equal lengths. To address the above issues, this paper introduces the concept of transaction occupancy (TO) and thereafter presents a new computational model of itemset occupancy (IO) in account of transaction occupancy. Transaction occupancy refers to the occupied portion in the database by the transactions. This paper proposes an efficient list-structure-based algorithm called HOIMTO (high occupancy itemset mining with transaction occupancy) to discover the high occupancy itemsets (HOIs) from the transactional databases. A new itemset occupancy upper bound (IOUB) is also introduced in this paper to reduce the candidate search space. Experimental studies show the effectiveness of the proposed approach in terms of itemset generation, runtime, memory usages and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.yatra.com/india-tour-packages/colourful-rajasthan.

  2. http://fimi.uantwerpen.be/data/.

  3. https://github.com/stedy/Machine-Learning-with-R-dataset/blob/master/groceries.csv.

References

  1. Chee, C.H.; Jaafar, J.; Aziz, I.A.; Hasan, M.H.; Yeoh, W.: Algorithms for frequent itemset mining. Artif. Intell. Rev. 52, 2603–2621 (2019)

    Article  Google Scholar 

  2. Fournier-Viger, P.; Lin, J.C.W.; Vo, B.; Chi, T.T.; Zhang, J.; Le, H.B.: A survey of itemset mining. WIREs Data Min. Knowl. Discov. 7(4), e1207 (2017)

  3. Luna, J.M.; Fournier-Viger, P.; Ventura, S.: Frequent itemset mining: a 25 years review. WIREs Data Min. Knowl. Discov. 9(6), e1329 (2019)

  4. Raj, S.; Ramesh, D.; Sreenu, M.; Sethi, K.K.: EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data. Knowl. Inf. Syst. 62, 3565–3583 (2020)

    Article  Google Scholar 

  5. Datta, S.; Mali, K.; Ghosh, S.: Mining frequent patterns partially devoid of dissociation with automated MMS specification strategy. IETE J. Res. (2020). https://doi.org/10.1080/03772063.2020.1838343

    Article  Google Scholar 

  6. Mengash, H.A.: Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access 8, 55462–55470 (2020)

    Article  Google Scholar 

  7. Agarwal, R.; Imielinski, T.; Swami, A.: Mining association rules between sets of items in large datasets. In: Proceedings of the ACM SIGMOD’93, pp. 207–216 (1993)

  8. Datta, S.; Mali, K.: Significant association rule mining with high associability. In: Proceedings of IEEE ICICCS’21, Madurai, India (2021). https://doi.org/10.1109/ICICCS51141.2021.9432237

  9. Unvan, Y.A.: Market basket analysis with association rules. Commun. Stat. Theory Methods 50(7), 1615–1628 (2021)

    Article  MathSciNet  Google Scholar 

  10. Jin, J.; Sun, W.; Al-Turjman, F.; Khan, M.B.; Yang, X.: Activity pattern mining for healthcare. IEEE Access 8, 56730–56738 (2020)

    Article  Google Scholar 

  11. Huang, J.Y.; Liao, I.E.; Chung, Y.F.; Chen, K.T.: Shielding wireless sensor network using Markovian intrusion detection system with attack pattern mining. Inf. Sci. 231, 32–44 (2013)

    Article  MathSciNet  Google Scholar 

  12. Seeja, R.K.; Zareapoor, M.: FraudMiner: a novel credit card fraud detection model based on frequent itemset mining. Sci. World J. 2014, art. id. 252797 (2014)

  13. Verma, Y.; Yadav, A.; Katara, P.: Mining of cancer core-genes and their protein interactome using expression profiling based PPI network approach. Gene Rep. 18, art. 10583 (2020)

  14. Bin, C.; Gu, T.; Sun, Y.; Chang, L.: A personalized POI route recommendation system based on heterogeneous tourism data and sequential pattern mining. Multimedia Tools Appl. 78, 35135–35156 (2019)

    Article  Google Scholar 

  15. Li, Y.C.; Yeh, J.S.; Chang, C.C.: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng. 64(1), 198–217 (2008)

    Article  Google Scholar 

  16. Datta, S.; Mali, K.; Ghosh, S.: Weighted association rule mining over unweighted databases using inter-item link based automated weighting scheme. Arab. J. Sci. Eng. 46, 3169–3188 (2021)

    Article  Google Scholar 

  17. Tang, L.; Zhang, L.; Luo, P.; Wang, M.: Incorporating occupancy into frequent pattern mining for high quality pattern recommendation. In: Proceedings of CIKM’12, pp. 75–84 (2012)

  18. Deng, Z.H.: Mining high occupancy itemsets. Future Gener. Comput. Syst. 102, 222–229 (2020)

    Article  Google Scholar 

  19. Liu, Q.; Ge, Y.; Li, Z.; Chen, E.; Xiong, H.: Personalized travel package recommendation. In: Proceedings of IEEE ICDM’11, pp. 407–416 (2011)

  20. Liu, Q.; Chen, E.; Xiong, H.; Ge, Y.; Li, Z.; Wu, X.: A cocktail approach for travel package recommendation. IEEE TKDE 26(2), 278–293 (2014)

    Google Scholar 

  21. Yu, Z.; Xu, H.; Yang, Z.; Guo, B.: Personalized travel package with multi-point-of-interest recommendation based on crowdsourced user footprints. IEEE Trans. Human-Machine Syst. 46(1), 151–158 (2016)

    Article  Google Scholar 

  22. Zhu, G.; Wang, Y.; Cao, J.; Bu, Z.; Yang, S.; Liang, W.; Liu, J.: Neural attentive travel package recommendation via exploiting long-term and short-term behaviors. Knowl.-Based Syst. 211, art. 106511 (2021)

  23. Zhang, X.; Duan, F.; Zhang, L.; Cheng, F.; Jin, Y.; Tang, K.: Pattern recommendation in task-oriented applications: a multi-objective perspective. IEEE Comput. Intell. Magazine 12(3), 43–53 (2017)

    Article  Google Scholar 

  24. Zhang, L.; Tang, L.; Luo, P.; Chen, E.; Jiao, L.; Wang, M.; Liu, G.: Harnessing the wisdom of the crowds for accurate web page clipping. In: Proceeding of KDD’12, pp. 570–578 (2012)

  25. Fasanghari, M.; Montazer, G.A.: Design and implementation of fuzzy expert system for Tehran Stock Exchange portfolio recommendation. Expt. Syst. Appl. 37, 6138–6147 (2010)

    Article  Google Scholar 

  26. Gao, Q.; Xu, D.L.: An empirical study on the application of the evidential reasoning rule to decision making in financial investment. Knowl.-Based Syst. 164, 226–234 (2019)

    Article  Google Scholar 

  27. Zhong, H.; Liu, C.; Zhong, J.; Xiong, H.: Which startup to invest in: a personalized portfolio strategy. Ann. Oper. Res. 263, 339–360 (2018)

  28. Zhang, L.; Luo, P.; Tang, L.; Chen, E.; Liu, Q.; Wang, M.; Xiong, H.: Occupancy-based frequent pattern mining. ACM TKDD 10(2), art. 14 (2015).

  29. Shen, B.; Wen, Z.; Zhao, Y.; Zhou, D.; Zheng, W.: OCEAN: fast discovery of high utility occupancy itemsets. In: Proceedings of PAKDD’16, pp. 354–365 (2016)

  30. Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Yu, P.S.: HUOPM: high utility occupancy pattern mining. IEEE Trans. Cybern. 50(3), 1195–1208 (2020)

    Article  Google Scholar 

  31. Chen, C.M.; Chen, L.; Gan, W.; Qiu, L.; Ding, W.: Discovering high utility-occupancy patterns from uncertain data. Inf. Sci. 546, 1208–1229 (2021)

    Article  MathSciNet  Google Scholar 

  32. Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Zhan, J.; Zhang, J.: Exploiting highly qualified pattern with frequency and weight occupancy. Knowl. Inf. Syst. 56, 165–196 (2018)

    Article  Google Scholar 

  33. Zhang, K.; Zhang, Y.; Wang, Z.: Frequent pattern mining based on occupancy and correlation. In: Proceedings of IEEE ICEICT’20, Shenzhen, China (2020). https://doi.org/10.1109/ICEICT51264.2020.9334367

  34. Adhikari, J.: Occupancy based pattern mining: current status and future directions. Int. J. Next-Gener. Comput. 11(1), 36–51 (2020)

    Google Scholar 

  35. Datta, S.; Mali, K.; Ghosh, S.; Singh, R.; Das, S.: Interesting pattern mining using item influence. In: S.C. Satapathy et al. (eds.) Advances in Decision Sciences, Image Processing, Security and Computer Vision, LAIS, Vol. 3, pp. 426–434. Springer (2020)

  36. Kiran, R.U.; Shang, H.; Toyoda, M.; Kitsuregawa, M.: Discovering recurring patterns in time series. In: Proceedings of EDBT’15, pp. 97–108 (2015)

  37. Lee, S.; Park, J.S.: Top-k high utility itemset mining based on utility-list structures. In: Proceedings of IEEE BigComp’16, pp. 101–108 (2016)

  38. Sethi, K.K.; Ramesh, D.: A fast high average-utility itemset mining with efficient tight upper bounds and novel list structure. J. Supercomput. 76, 10288–10318 (2020)

    Article  Google Scholar 

  39. Datta, S.; Bose, S.: Mining and ranking association rules in support, confidence, correlation and dissociation framework. In: S. Das et al. (eds.) Proceedings of FICTA’15, AISC, Vol. 404, pp. 141–152. Springer (2015)

  40. Datta, S.; Bose, S.: Discovering association rules partially devoid of dissociation by weighted confidence. In: Proceedings of IEEE ReTIS’15, Kolkata, India, pp. 138–143 (2015)

  41. Bose, S.; Datta, S.: Frequent pattern generation in association rule mining using weighted support. In: Proceedings of IEEE C3IT’15, Hooghly, India, pp. 1–5 (2015)

  42. Borah, A.; Nath, B.: Comparative evaluation of pattern mining techniques: an empirical study. Complex Intell. Syst. 7, 589–619 (2021)

    Article  Google Scholar 

  43. Fournier-Viger, P.; Lin, J.C.W.; Gomariz, A.; Gueniche, T.; Soltani, A.; Deng, Z.; Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of ECML PKDD’16, part III, LNCS, 9853, pp. 36–40. Springer (2016)

Download references

Author information

Authors and Affiliations

Authors

Contributions

SD developed the concept, designed the algorithms and wrote the manuscript. KM supervised the experimental analysis and data analysis. UG arranged the resources and did the coding of the algorithms in python for the experimental purposes. All of the authors have read and approved the final manuscript.

Corresponding author

Correspondence to Subrata Datta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Datta, S., Mali, K. & Ghosh, U. High Occupancy Itemset Mining with Consideration of Transaction Occupancy. Arab J Sci Eng 47, 2061–2075 (2022). https://doi.org/10.1007/s13369-021-06075-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-021-06075-8

Keywords

Navigation