Abstract
For mutual benefit, data is shared among business organizations. However, this may result in privacy and security threats. To address this issue, privacy-preserving data mining is presented to sanitize the original database to hide all sensitive knowledge. Privacy-preserving utility mining is an extension of privacy-preserving data mining, the objective of which is to hide all sensitive high-utility itemsets and minimize the side effects on non-sensitive knowledge caused by the sanitization process. In this paper, three heuristic algorithms for privacy-preserving utility mining are proposed, namely, Selecting Maximum Utility item first (SMAU), Selecting Minimum Utility item first (SMIU) and Selecting Minimum Side Effects item first (SMSE). The quality of the database is well maintained because all of the proposed algorithms consider the side effects on the non-sensitive itemsets. Furthermore, to avoid performing multiple database scans, two table structures, T-table and HUI-table, are adopted to accelerate the hiding process by only scanning the database twice. The experimental results show that the proposed approaches successfully conceal all sensitive itemsets with fewer distortions of non-sensitive knowledge. Moreover, the influence of the database density on the proposed approaches is observed.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499
Djenouri Y, Belhadi A, Fournier-Viger P, Fujita H (2018) Mining diversified association rules in big datasets: a cluster/GPU/genetic approach. Inf Sci 459:117–134
Rushing JA, Ranganath H, Hinke TH, Graves SJ (2002) Image segmentation using association rule features. IEEE Trans Image Process 11(5):558–567
Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Disc 2(1):39–68
Ikram A, Qamar U (2015) Developing an expert system based on association rules and predicate logic for earthquake prediction. Knowl-Based Syst 75(C):87–103
Lin JC-W, Yang L, Fournier-Viger P, Dawar S, Goyal V, Sureka A, Vo B (2016) A more efficient algorithm to mine skyline frequent-utility patterns. International conference on genetic and evolutionary computing: 127–135
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Dam T-L, Ramampiaro H, Nørvåg K, Duong Q-H (2019) Towards efficiently mining closed high utility itemsets from incremental databases. Knowl-Based Syst 165:13–29
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Philip SY (2019) HUOPM: high-utility occupancy pattern mining. IEEE transactions on cybernetics
Lin JC-W, Yang L, Fournier-Viger P, Hong T-P (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238
Fournier-Viger P, Zhang YM, Lin JCW, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inf Sci 481:344–367
Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Hong TP, Fujita H (2018) A survey of incremental high-utility itemset mining. Wiley Interdiscip Rev: Data Mining Knowledge Discov 8(2):e1242
Lee J, Yun U, Lee G, Yoon E (2018) Efficient incremental high utility pattern mining based on pre-large concept. Eng Appl Artif Intell 72:111–123
O'Leary DE (1991) Knowledge discovery as a treat to database security. Proceedings of the 1st international conference in knowledge discovery and database: 507–516
Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proc 2000 ACM SIGMOD Int Conf Manag Data 2:439–450
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. Proceedings of the 1999 workshop on knowledge and data engineering exchange: 45–52
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. Proceedings of the 4th international workshop on information hiding: 369–383
Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd international conference on data mining: 613–616
Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. Proceedings of the 5th international conference on data mining: 426–433
Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3):241–258
Moustakides GV, Verykios VS (2006) A max-min approach for hiding frequent itemsets. In: Proceedings of the 6th international conference on data mining: 502–506
Moustakides GV, Verykios VS (2008) A MaxMin approach for hiding frequent itemsets. Data Knowledge Eng 65(1):75–89
Amiri A (2007) Dare to share: protecting sensitive knowledge with data sanitization. Decis Support Syst 43(1):181–191
Wang S-L, Parikh B, Jafari A (2007) Hiding informative association rule sets. Expert Syst Appl 33(2):316–323
Wu YH, Chiang CM, Chen AL (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29–42
Gkoulalas Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699–713
Wu C-M, Huang Y-F (2011) A cost-efficient and versatile sanitizing algorithm by using a greedy approach. Soft Comput 15(5):939–952
Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510
Le HQ, Arch Int S, Nguyen HX, Arch Int N (2013) Association rule hiding in risk management for retail supply chain collaboration. Comput Ind 64(7):776–784
Le HQ, Arch Int S, Arch Int N (2013) Association rule hiding based on intersection lattice. Math Probl Eng 2013:1–11
Shah RA, Asghar S (2014) Privacy preserving in association rules using a genetic algorithm. Turk J Electr Eng Comput Sci 22(2):434–450
Cheng P, Ivan L, Jeng Shyang P, Chun Wei L, Roddick JF (2015) Hide association rules with fewer side effects. IEICE Trans Inf Syst 98(10):1788–1798
Cheng P, Pan JS, Lin CW (2014) Privacy preserving association rule mining using binary encoded NSGA-II. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining: 87–99
Cheng P, Lin CW, Pan JS (2015) Use HypE to hide association rules by adding items. PLoS One 10(6):e0127834
Lin JC-W, Zhang Y, Zhang B, Fournier-Viger P, Djenouri Y (2019) Hiding sensitive itemsets with multiple objective optimization. Soft Comput:1–19
Yeh JS, Hsu PC (2010) HHUIF and MSICF: novel algorithms for privacy preserving utility mining. Expert Syst Appl 37(7):4779–4786
Lin CW, Hong TP, Wong JW, Lan GC, Lin WY (2014) A GA-based approach to hide sensitive high utility itemsets. Sci World J 2014:1):1–1)12
Yun U, Kim J (2015) A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Syst Appl 42(3):1149–1165
Lin JCW, Wu TY, Fournier Viger P, Lin G, Zhan J, Voznak M (2016) Fast algorithms for hiding sensitive high-utility itemsets in privacy-preserving utility mining. Eng Appl Artif Intell 55(C):269–284
Rajalaxmi RR, Natarajan AM (2009) A novel sanitization approach for privacy preserving utility itemset mining. Comput Inform Sci 1(3):77–82
Lin JC-W, Hong T-P, Fournier-Viger P, Liu Q, Wong J-W, Zhan J (2017) Efficient hiding of confidential high-utility itemsets with minimal side effects. J Experiment Theor Artif Intell 29(6):1225–1245
Rajalaxmi RR, Natarajan AM (2012) Effective sanitization approaches to hide sensitive utility and frequent itemsets. Intell Data Anal 16(6):933–951
Xuan Liu FX, Lv X (2018) A novel approach for hiding sensitive utility and frequent itemsets. Intelligent Data Analysis 22(6):1259–1278
Le B, Dinh DT, Huynh VN, Nguyen QM, Fournier-Viger P (2018) An efficient algorithm for hiding high utility sequential patterns. Int J Approx Reason 95:77–92
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. Proceedings of the 4th SIAM international conference on data mining: 482–486
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining: 689–695
SPMF: an open-source data mining library. http://www.philippe-fournier-viger.com/spmf/ index.php?link=datasets.php
Zida S, Fournier Viger P, Lin CW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. Mexican International Conference on Artifical Intelligence: 530–546
Lee G, Chen YC (2012) Protecting sensitive knowledge in association patterns mining. Wiley Interdiscip Rev: Data Mining Knowledge Discov 2(1):60–68
Verykios VS (2013) Association rule hiding methods. Wiley Interdiscip Rev: Data Mining Knowledge Discov 3(3):28–36
Gkoulalas Divanis A, Haritsa J, Kantarcioglu M (2014) Privacy issues in association rule mining. In: privacy issues in association rule mining. Frequent pattern mining. Springer International Publishing: 369–401
Acknowledgements
This work is supported by the National Natural Science Foundation of China (61802344); Zhejiang Provincial Natural Science Foundation of China (LY16F030012); Ningbo Natural Science Foundation of China (2017A610118); General Scientific Research Projects of Zhejiang Education Department (Y201534788) and Youth Foundation for Humanities and Social Sciences Research of Ministry of Education of China (16YJCZH112).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, X., Wen, S. & Zuo, W. Effective sanitization approaches to protect sensitive knowledge in high-utility itemset mining. Appl Intell 50, 169–191 (2020). https://doi.org/10.1007/s10489-019-01524-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01524-2