Abstract
Discovering high occupancy itemsets is an interesting area of research in data mining. Occupancy computation in traditional approaches is restricted to the occupied portions of the itemsets in the supporting transactions only. It can’t distinguish between the occupancies of the same itemset in different supporting transactions of equal lengths. If itemset size is equal to the transaction length, occupancy becomes highest. The fact promotes the generation of undesirable itemsets especially the isolated ones. Furthermore, average occupancies of the itemsets having equal size become equal though they appear in different transactions of equal lengths. To address the above issues, this paper introduces the concept of transaction occupancy (TO) and thereafter presents a new computational model of itemset occupancy (IO) in account of transaction occupancy. Transaction occupancy refers to the occupied portion in the database by the transactions. This paper proposes an efficient list-structure-based algorithm called HOIMTO (high occupancy itemset mining with transaction occupancy) to discover the high occupancy itemsets (HOIs) from the transactional databases. A new itemset occupancy upper bound (IOUB) is also introduced in this paper to reduce the candidate search space. Experimental studies show the effectiveness of the proposed approach in terms of itemset generation, runtime, memory usages and scalability.
Similar content being viewed by others
References
Chee, C.H.; Jaafar, J.; Aziz, I.A.; Hasan, M.H.; Yeoh, W.: Algorithms for frequent itemset mining. Artif. Intell. Rev. 52, 2603–2621 (2019)
Fournier-Viger, P.; Lin, J.C.W.; Vo, B.; Chi, T.T.; Zhang, J.; Le, H.B.: A survey of itemset mining. WIREs Data Min. Knowl. Discov. 7(4), e1207 (2017)
Luna, J.M.; Fournier-Viger, P.; Ventura, S.: Frequent itemset mining: a 25 years review. WIREs Data Min. Knowl. Discov. 9(6), e1329 (2019)
Raj, S.; Ramesh, D.; Sreenu, M.; Sethi, K.K.: EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data. Knowl. Inf. Syst. 62, 3565–3583 (2020)
Datta, S.; Mali, K.; Ghosh, S.: Mining frequent patterns partially devoid of dissociation with automated MMS specification strategy. IETE J. Res. (2020). https://doi.org/10.1080/03772063.2020.1838343
Mengash, H.A.: Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access 8, 55462–55470 (2020)
Agarwal, R.; Imielinski, T.; Swami, A.: Mining association rules between sets of items in large datasets. In: Proceedings of the ACM SIGMOD’93, pp. 207–216 (1993)
Datta, S.; Mali, K.: Significant association rule mining with high associability. In: Proceedings of IEEE ICICCS’21, Madurai, India (2021). https://doi.org/10.1109/ICICCS51141.2021.9432237
Unvan, Y.A.: Market basket analysis with association rules. Commun. Stat. Theory Methods 50(7), 1615–1628 (2021)
Jin, J.; Sun, W.; Al-Turjman, F.; Khan, M.B.; Yang, X.: Activity pattern mining for healthcare. IEEE Access 8, 56730–56738 (2020)
Huang, J.Y.; Liao, I.E.; Chung, Y.F.; Chen, K.T.: Shielding wireless sensor network using Markovian intrusion detection system with attack pattern mining. Inf. Sci. 231, 32–44 (2013)
Seeja, R.K.; Zareapoor, M.: FraudMiner: a novel credit card fraud detection model based on frequent itemset mining. Sci. World J. 2014, art. id. 252797 (2014)
Verma, Y.; Yadav, A.; Katara, P.: Mining of cancer core-genes and their protein interactome using expression profiling based PPI network approach. Gene Rep. 18, art. 10583 (2020)
Bin, C.; Gu, T.; Sun, Y.; Chang, L.: A personalized POI route recommendation system based on heterogeneous tourism data and sequential pattern mining. Multimedia Tools Appl. 78, 35135–35156 (2019)
Li, Y.C.; Yeh, J.S.; Chang, C.C.: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng. 64(1), 198–217 (2008)
Datta, S.; Mali, K.; Ghosh, S.: Weighted association rule mining over unweighted databases using inter-item link based automated weighting scheme. Arab. J. Sci. Eng. 46, 3169–3188 (2021)
Tang, L.; Zhang, L.; Luo, P.; Wang, M.: Incorporating occupancy into frequent pattern mining for high quality pattern recommendation. In: Proceedings of CIKM’12, pp. 75–84 (2012)
Deng, Z.H.: Mining high occupancy itemsets. Future Gener. Comput. Syst. 102, 222–229 (2020)
Liu, Q.; Ge, Y.; Li, Z.; Chen, E.; Xiong, H.: Personalized travel package recommendation. In: Proceedings of IEEE ICDM’11, pp. 407–416 (2011)
Liu, Q.; Chen, E.; Xiong, H.; Ge, Y.; Li, Z.; Wu, X.: A cocktail approach for travel package recommendation. IEEE TKDE 26(2), 278–293 (2014)
Yu, Z.; Xu, H.; Yang, Z.; Guo, B.: Personalized travel package with multi-point-of-interest recommendation based on crowdsourced user footprints. IEEE Trans. Human-Machine Syst. 46(1), 151–158 (2016)
Zhu, G.; Wang, Y.; Cao, J.; Bu, Z.; Yang, S.; Liang, W.; Liu, J.: Neural attentive travel package recommendation via exploiting long-term and short-term behaviors. Knowl.-Based Syst. 211, art. 106511 (2021)
Zhang, X.; Duan, F.; Zhang, L.; Cheng, F.; Jin, Y.; Tang, K.: Pattern recommendation in task-oriented applications: a multi-objective perspective. IEEE Comput. Intell. Magazine 12(3), 43–53 (2017)
Zhang, L.; Tang, L.; Luo, P.; Chen, E.; Jiao, L.; Wang, M.; Liu, G.: Harnessing the wisdom of the crowds for accurate web page clipping. In: Proceeding of KDD’12, pp. 570–578 (2012)
Fasanghari, M.; Montazer, G.A.: Design and implementation of fuzzy expert system for Tehran Stock Exchange portfolio recommendation. Expt. Syst. Appl. 37, 6138–6147 (2010)
Gao, Q.; Xu, D.L.: An empirical study on the application of the evidential reasoning rule to decision making in financial investment. Knowl.-Based Syst. 164, 226–234 (2019)
Zhong, H.; Liu, C.; Zhong, J.; Xiong, H.: Which startup to invest in: a personalized portfolio strategy. Ann. Oper. Res. 263, 339–360 (2018)
Zhang, L.; Luo, P.; Tang, L.; Chen, E.; Liu, Q.; Wang, M.; Xiong, H.: Occupancy-based frequent pattern mining. ACM TKDD 10(2), art. 14 (2015).
Shen, B.; Wen, Z.; Zhao, Y.; Zhou, D.; Zheng, W.: OCEAN: fast discovery of high utility occupancy itemsets. In: Proceedings of PAKDD’16, pp. 354–365 (2016)
Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Yu, P.S.: HUOPM: high utility occupancy pattern mining. IEEE Trans. Cybern. 50(3), 1195–1208 (2020)
Chen, C.M.; Chen, L.; Gan, W.; Qiu, L.; Ding, W.: Discovering high utility-occupancy patterns from uncertain data. Inf. Sci. 546, 1208–1229 (2021)
Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Zhan, J.; Zhang, J.: Exploiting highly qualified pattern with frequency and weight occupancy. Knowl. Inf. Syst. 56, 165–196 (2018)
Zhang, K.; Zhang, Y.; Wang, Z.: Frequent pattern mining based on occupancy and correlation. In: Proceedings of IEEE ICEICT’20, Shenzhen, China (2020). https://doi.org/10.1109/ICEICT51264.2020.9334367
Adhikari, J.: Occupancy based pattern mining: current status and future directions. Int. J. Next-Gener. Comput. 11(1), 36–51 (2020)
Datta, S.; Mali, K.; Ghosh, S.; Singh, R.; Das, S.: Interesting pattern mining using item influence. In: S.C. Satapathy et al. (eds.) Advances in Decision Sciences, Image Processing, Security and Computer Vision, LAIS, Vol. 3, pp. 426–434. Springer (2020)
Kiran, R.U.; Shang, H.; Toyoda, M.; Kitsuregawa, M.: Discovering recurring patterns in time series. In: Proceedings of EDBT’15, pp. 97–108 (2015)
Lee, S.; Park, J.S.: Top-k high utility itemset mining based on utility-list structures. In: Proceedings of IEEE BigComp’16, pp. 101–108 (2016)
Sethi, K.K.; Ramesh, D.: A fast high average-utility itemset mining with efficient tight upper bounds and novel list structure. J. Supercomput. 76, 10288–10318 (2020)
Datta, S.; Bose, S.: Mining and ranking association rules in support, confidence, correlation and dissociation framework. In: S. Das et al. (eds.) Proceedings of FICTA’15, AISC, Vol. 404, pp. 141–152. Springer (2015)
Datta, S.; Bose, S.: Discovering association rules partially devoid of dissociation by weighted confidence. In: Proceedings of IEEE ReTIS’15, Kolkata, India, pp. 138–143 (2015)
Bose, S.; Datta, S.: Frequent pattern generation in association rule mining using weighted support. In: Proceedings of IEEE C3IT’15, Hooghly, India, pp. 1–5 (2015)
Borah, A.; Nath, B.: Comparative evaluation of pattern mining techniques: an empirical study. Complex Intell. Syst. 7, 589–619 (2021)
Fournier-Viger, P.; Lin, J.C.W.; Gomariz, A.; Gueniche, T.; Soltani, A.; Deng, Z.; Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of ECML PKDD’16, part III, LNCS, 9853, pp. 36–40. Springer (2016)
Author information
Authors and Affiliations
Contributions
SD developed the concept, designed the algorithms and wrote the manuscript. KM supervised the experimental analysis and data analysis. UG arranged the resources and did the coding of the algorithms in python for the experimental purposes. All of the authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Datta, S., Mali, K. & Ghosh, U. High Occupancy Itemset Mining with Consideration of Transaction Occupancy. Arab J Sci Eng 47, 2061–2075 (2022). https://doi.org/10.1007/s13369-021-06075-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-021-06075-8