Skip to main content
Log in

Effective sanitization approaches to protect sensitive knowledge in high-utility itemset mining

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

For mutual benefit, data is shared among business organizations. However, this may result in privacy and security threats. To address this issue, privacy-preserving data mining is presented to sanitize the original database to hide all sensitive knowledge. Privacy-preserving utility mining is an extension of privacy-preserving data mining, the objective of which is to hide all sensitive high-utility itemsets and minimize the side effects on non-sensitive knowledge caused by the sanitization process. In this paper, three heuristic algorithms for privacy-preserving utility mining are proposed, namely, Selecting Maximum Utility item first (SMAU), Selecting Minimum Utility item first (SMIU) and Selecting Minimum Side Effects item first (SMSE). The quality of the database is well maintained because all of the proposed algorithms consider the side effects on the non-sensitive itemsets. Furthermore, to avoid performing multiple database scans, two table structures, T-table and HUI-table, are adopted to accelerate the hiding process by only scanning the database twice. The experimental results show that the proposed approaches successfully conceal all sensitive itemsets with fewer distortions of non-sensitive knowledge. Moreover, the influence of the database density on the proposed approaches is observed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499

    Google Scholar 

  2. Djenouri Y, Belhadi A, Fournier-Viger P, Fujita H (2018) Mining diversified association rules in big datasets: a cluster/GPU/genetic approach. Inf Sci 459:117–134

    Article  MathSciNet  Google Scholar 

  3. Rushing JA, Ranganath H, Hinke TH, Graves SJ (2002) Image segmentation using association rule features. IEEE Trans Image Process 11(5):558–567

    Article  Google Scholar 

  4. Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Disc 2(1):39–68

    Article  Google Scholar 

  5. Ikram A, Qamar U (2015) Developing an expert system based on association rules and predicate logic for earthquake prediction. Knowl-Based Syst 75(C):87–103

    Article  Google Scholar 

  6. Lin JC-W, Yang L, Fournier-Viger P, Dawar S, Goyal V, Sureka A, Vo B (2016) A more efficient algorithm to mine skyline frequent-utility patterns. International conference on genetic and evolutionary computing: 127–135

  7. Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205

    Article  Google Scholar 

  8. Dam T-L, Ramampiaro H, Nørvåg K, Duong Q-H (2019) Towards efficiently mining closed high utility itemsets from incremental databases. Knowl-Based Syst 165:13–29

    Article  Google Scholar 

  9. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Philip SY (2019) HUOPM: high-utility occupancy pattern mining. IEEE transactions on cybernetics

  10. Lin JC-W, Yang L, Fournier-Viger P, Hong T-P (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238

    Article  Google Scholar 

  11. Fournier-Viger P, Zhang YM, Lin JCW, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inf Sci 481:344–367

    Article  MathSciNet  Google Scholar 

  12. Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206

    Article  Google Scholar 

  13. Gan W, Lin JCW, Fournier-Viger P, Chao HC, Hong TP, Fujita H (2018) A survey of incremental high-utility itemset mining. Wiley Interdiscip Rev: Data Mining Knowledge Discov 8(2):e1242

    Google Scholar 

  14. Lee J, Yun U, Lee G, Yoon E (2018) Efficient incremental high utility pattern mining based on pre-large concept. Eng Appl Artif Intell 72:111–123

    Article  Google Scholar 

  15. O'Leary DE (1991) Knowledge discovery as a treat to database security. Proceedings of the 1st international conference in knowledge discovery and database: 507–516

  16. Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proc 2000 ACM SIGMOD Int Conf Manag Data 2:439–450

    Google Scholar 

  17. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. Proceedings of the 1999 workshop on knowledge and data engineering exchange: 45–52

  18. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. Proceedings of the 4th international workshop on information hiding: 369–383

  19. Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd international conference on data mining: 613–616

  20. Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. Proceedings of the 5th international conference on data mining: 426–433

  21. Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94

    Article  Google Scholar 

  22. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3):241–258

    Article  Google Scholar 

  23. Moustakides GV, Verykios VS (2006) A max-min approach for hiding frequent itemsets. In: Proceedings of the 6th international conference on data mining: 502–506

  24. Moustakides GV, Verykios VS (2008) A MaxMin approach for hiding frequent itemsets. Data Knowledge Eng 65(1):75–89

    Article  Google Scholar 

  25. Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191

    Article  Google Scholar 

  26. Wang S-L, Parikh B, Jafari A (2007) Hiding informative association rule sets. Expert Syst Appl 33(2):316–323

    Article  Google Scholar 

  27. Wu YH, Chiang CM, Chen AL (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42

    Article  Google Scholar 

  28. Gkoulalas Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699–713

    Article  Google Scholar 

  29. Wu C-M, Huang Y-F (2011) A cost-efficient and versatile sanitizing algorithm by using a greedy approach. Soft Comput 15(5):939–952

    Article  Google Scholar 

  30. Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510

    Article  Google Scholar 

  31. Le HQ, Arch Int S, Nguyen HX, Arch Int N (2013) Association rule hiding in risk management for retail supply chain collaboration. Comput Ind 64(7):776–784

    Article  Google Scholar 

  32. Le HQ, Arch Int S, Arch Int N (2013) Association rule hiding based on intersection lattice. Math Probl Eng 2013:1–11

    MathSciNet  MATH  Google Scholar 

  33. Shah RA, Asghar S (2014) Privacy preserving in association rules using a genetic algorithm. Turk J Electr Eng Comput Sci 22(2):434–450

    Article  Google Scholar 

  34. Cheng P, Ivan L, Jeng Shyang P, Chun Wei L, Roddick JF (2015) Hide association rules with fewer side effects. IEICE Trans Inf Syst 98(10):1788–1798

    Article  Google Scholar 

  35. Cheng P, Pan JS, Lin CW (2014) Privacy preserving association rule mining using binary encoded NSGA-II. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining: 87–99

  36. Cheng P, Lin CW, Pan JS (2015) Use HypE to hide association rules by adding items. PLoS One 10(6):e0127834

    Article  Google Scholar 

  37. Lin JC-W, Zhang Y, Zhang B, Fournier-Viger P, Djenouri Y (2019) Hiding sensitive itemsets with multiple objective optimization. Soft Comput:1–19

  38. Yeh JS, Hsu PC (2010) HHUIF and MSICF: novel algorithms for privacy preserving utility mining. Expert Syst Appl 37(7):4779–4786

    Article  Google Scholar 

  39. Lin CW, Hong TP, Wong JW, Lan GC, Lin WY (2014) A GA-based approach to hide sensitive high utility itemsets. Sci World J 2014:1):1–1)12

    Google Scholar 

  40. Yun U, Kim J (2015) A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Syst Appl 42(3):1149–1165

    Article  Google Scholar 

  41. Lin JCW, Wu TY, Fournier Viger P, Lin G, Zhan J, Voznak M (2016) Fast algorithms for hiding sensitive high-utility itemsets in privacy-preserving utility mining. Eng Appl Artif Intell 55(C):269–284

    Article  Google Scholar 

  42. Rajalaxmi RR, Natarajan AM (2009) A novel sanitization approach for privacy preserving utility itemset mining. Comput Inform Sci 1(3):77–82

    Google Scholar 

  43. Lin JC-W, Hong T-P, Fournier-Viger P, Liu Q, Wong J-W, Zhan J (2017) Efficient hiding of confidential high-utility itemsets with minimal side effects. J Experiment Theor Artif Intell 29(6):1225–1245

    Article  Google Scholar 

  44. Rajalaxmi RR, Natarajan AM (2012) Effective sanitization approaches to hide sensitive utility and frequent itemsets. Intell Data Anal 16(6):933–951

    Article  Google Scholar 

  45. Xuan Liu FX, Lv X (2018) A novel approach for hiding sensitive utility and frequent itemsets. Intelligent Data Analysis 22(6):1259–1278

    Article  Google Scholar 

  46. Le B, Dinh DT, Huynh VN, Nguyen QM, Fournier-Viger P (2018) An efficient algorithm for hiding high utility sequential patterns. Int J Approx Reason 95:77–92

    Article  MathSciNet  Google Scholar 

  47. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. Proceedings of the 4th SIAM international conference on data mining: 482–486

  48. Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining: 689–695

  49. SPMF: an open-source data mining library. http://www.philippe-fournier-viger.com/spmf/ index.php?link=datasets.php

  50. Zida S, Fournier Viger P, Lin CW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. Mexican International Conference on Artifical Intelligence: 530–546

  51. Lee G, Chen YC (2012) Protecting sensitive knowledge in association patterns mining. Wiley Interdiscip Rev: Data Mining Knowledge Discov 2(1):60–68

    Google Scholar 

  52. Verykios VS (2013) Association rule hiding methods. Wiley Interdiscip Rev: Data Mining Knowledge Discov 3(3):28–36

    Google Scholar 

  53. Gkoulalas Divanis A, Haritsa J, Kantarcioglu M (2014) Privacy issues in association rule mining. In: privacy issues in association rule mining. Frequent pattern mining. Springer International Publishing: 369–401

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61802344); Zhejiang Provincial Natural Science Foundation of China (LY16F030012); Ningbo Natural Science Foundation of China (2017A610118); General Scientific Research Projects of Zhejiang Education Department (Y201534788) and Youth Foundation for Humanities and Social Sciences Research of Ministry of Education of China (16YJCZH112).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Wen, S. & Zuo, W. Effective sanitization approaches to protect sensitive knowledge in high-utility itemset mining. Appl Intell 50, 169–191 (2020). https://doi.org/10.1007/s10489-019-01524-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01524-2

Keywords

Navigation