skip to main content
research-article

On-Shelf Utility Mining of Sequence Data

Published:21 July 2021Publication History
Skip Abstract Section

Abstract

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS+, to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS+ has wider real-life applications owing to its high efficiency.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Vol. 1215, 487–499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 7th International Conference on Data Engineering. IEEE, 3–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong-Soo Jeong. 2010. Mining high utility web access sequences in dynamic web log data. In Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, 76–81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong-Soo Jeong. 2010. A novel approach for mining high-utility sequential patterns in sequence databases. ETRI Journal 32, 5 (2010), 676–686.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708–1721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2645–2657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 429–435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE Computer Society, 19–19.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jiahui Chen, Xu Guo, Wensheng Gan, Chien-Ming Chen, Weiping Ding, and Guoting Chen. 2020. OSUMI: On-shelf utility mining from itemset-based data. In Proceedings of the IEEE International Conference on Big Data. IEEE, 5340–5349.Google ScholarGoogle Scholar
  10. Thu-Lan Dam, Kenli Li, Philippe Fournier-Viger, and Quang-Huy Duong. 2017. An efficient algorithm for mining top- on-shelf high utility itemsets. Knowledge and Information Systems 52, 3 (2017), 621–655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tai Dinh, Van-Nam Huynh, and Bac Le. 2017. Mining periodic high utility sequential patterns. In Proceedings of the Asian Conference on Intelligent Information and Database Systems. Springer, 545–555.Google ScholarGoogle ScholarCross RefCross Ref
  12. Philippe Fournier-Viger, Jerry Chun-Wei Lin, Quang-Huy Duong, and Thu-Lan Dam. 2016. PHM: Mining periodic high-utility itemsets. In Proceedings of the Industrial Conference on Data Mining. Springer, 64–79.Google ScholarGoogle ScholarCross RefCross Ref
  13. Philippe Fournier-Viger, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54–77.Google ScholarGoogle Scholar
  14. Philippe Fournier-Viger, Jerry Chun-Wei Lin, Bay Vo, Tin Truong Chi, Ji Zhang, and Hoai Bac Le. 2017. A survey of itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 4 (2017), e1207.Google ScholarGoogle ScholarCross RefCross Ref
  15. Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83–92.Google ScholarGoogle Scholar
  16. Philippe Fournier-Viger, Yimin Zhang, Jerry Chun-Wei Lin, Hamido Fujita, and Yun Sing Koh. 2019. Mining local and peak high utility itemsets. Information Sciences 481 (2019), 344–367.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philippe Fournier-Viger and Souleymane Zida. 2015. FOSHU: Faster on-shelf high utility itemset mining–with or without negative unit profit. In Proceedings of the 30th Annual ACM Symposium on Applied Computing. 857–864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wensheng Gan, Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent Tseng, and Philip S. Yu. 2021. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1306–1327.Google ScholarGoogle ScholarCross RefCross Ref
  19. Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Athanasios V. Vasilakos, and S. Yu Philip. 2020. Utility-driven data analytics on uncertain data. IEEE Systems Journal 14, 3 (2020), 4442–4453.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. Mining recent high-utility patterns from temporal databases with time-sensitive constraint. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery. Springer, 3–18.Google ScholarGoogle ScholarCross RefCross Ref
  21. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and S. Yu Philip. 2020. ProUM: Projection-based utility mining on sequence data. Information Sciences 513 (2020), 222–240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2021. Fast utility mining on sequence data. IEEE Transactions on Cybernetics 51, 2 (2021), 487–500.Google ScholarGoogle ScholarCross RefCross Ref
  24. Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Hongzhi Yin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2021. Utility mining across multi-dimensional sequences. ACM Transactions on Knowledge Discovery from Data 15, 5 (2021), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Computing Surveys 38, 3 (2006), 9–es. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and I-Wei Lin. 2014. Discovery of high utility sequential patterns with consideration of on-shelf time periods of products. In Proceedings of the 31st Workshop on Combinatorial Mathematics and Computation Theory. 250–255.Google ScholarGoogle Scholar
  27. Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and Vincent S. Tseng. 2014. On-shelf utility mining with negative item values. Expert Systems with Applications 41, 7 (2014), 3450–3459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851–5857. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowledge-Based Systems 96, C (2016), 171–187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2017. Efficiently mining uncertain high-utility itemsets. Soft Computing 21, 11 (2017), 2801–2820. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Advanced Engineering Informatics 29, 3 (2015), 648–661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ying Liu, Wei-Keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689–695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alex Yuxuan Peng, Yun Sing Koh, and Patricia Riddle. 2017. mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 196–207.Google ScholarGoogle ScholarCross RefCross Ref
  34. Anjali N. Radkar and S. S. Pawar. 2015. Mining high on-shelf utility itemsets with negative values from dynamic updated database. International Journal of Advanced Studies in Computers, Science and Engineering 4, 6 (2015), 27.Google ScholarGoogle Scholar
  35. Bai-En Shie, Hui-Fang Hsiao, Vincent S. Tseng, and Philip S. Yu. 2011. Mining high utility mobile sequential patterns in mobile commerce environments. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 224–238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tin Truong-Chi and Philippe Fournier-Viger. 2019. A survey of high utility sequential pattern mining. In High-Utility Pattern Mining. Springer, 97–129.Google ScholarGoogle Scholar
  37. Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2012. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on knowledge and Data Engineering 25, 8 (2012), 1772–1786. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jun-Zhe Wang, Jiun-Long Huang, and Yi-Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowledge and Information Systems 49, 2 (2016), 597–627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies 90 (2018), 166–180.Google ScholarGoogle ScholarCross RefCross Ref
  40. Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 660–668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Junfu Yin, Zhigang Zheng, Longbing Cao, Yin Song, and Wei Wei. 2013. Efficiently mining top- high utility sequential patterns. In Proceedings of the 13th International Conference on Data Mining. IEEE, 1259–1264.Google ScholarGoogle ScholarCross RefCross Ref
  42. Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861–3878. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lili Zhang, Wenjie Wang, and Yuqing Zhang. 2019. Privacy preserving association rule mining: Taxonomy, techniques, and metrics. IEEE Access 7 (2019), 45032–45047.Google ScholarGoogle ScholarCross RefCross Ref
  44. Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 530–546.Google ScholarGoogle Scholar
  45. Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential patterns. BMC Systems Biology 11, 6 (2017), 109.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On-Shelf Utility Mining of Sequence Data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Knowledge Discovery from Data
            ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 2
            April 2022
            514 pages
            ISSN:1556-4681
            EISSN:1556-472X
            DOI:10.1145/3476120
            Issue’s Table of Contents

            Copyright © 2021 Association for Computing Machinery.

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 July 2021
            • Revised: 1 March 2021
            • Accepted: 1 March 2021
            • Received: 1 November 2020
            Published in tkdd Volume 16, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format