research-article

On-Shelf Utility Mining of Sequence Data

Authors:
Chunkai Zhang

Harbin Institute of Technology, Shenzhen, China

Harbin Institute of Technology, Shenzhen, China
View Profile

,
Zilin Du

Harbin Institute of Technology, IL, USA

Harbin Institute of Technology, IL, USA
View Profile

,
Yuting Yang

Harbin Institute of Technology, Shenzhen, China

Harbin Institute of Technology, Shenzhen, China
View Profile

,
Wensheng Gan

Jinan University, Guangzhou, China

Jinan University, Guangzhou, China
View Profile

,
Philip S. Yu

University of Illinois at Chicago, IL, USA

University of Illinois at Chicago, IL, USA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 16 Issue 2Article No.: 21pp 1–31https://doi.org/10.1145/3457570

Published:21 July 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS⁺, to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS⁺ has wider real-life applications owing to its high efficiency.

References

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Vol. 1215, 487–499. Google ScholarDigital Library
Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 7th International Conference on Data Engineering. IEEE, 3–14. Google ScholarDigital Library
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong-Soo Jeong. 2010. Mining high utility web access sequences in dynamic web log data. In Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, 76–81. Google ScholarDigital Library
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong-Soo Jeong. 2010. A novel approach for mining high-utility sequential patterns in sequence databases. ETRI Journal 32, 5 (2010), 676–686.Google ScholarCross Ref
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708–1721. Google ScholarDigital Library
Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2645–2657. Google ScholarDigital Library
Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 429–435. Google ScholarDigital Library
Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE Computer Society, 19–19.Google ScholarCross Ref
Jiahui Chen, Xu Guo, Wensheng Gan, Chien-Ming Chen, Weiping Ding, and Guoting Chen. 2020. OSUMI: On-shelf utility mining from itemset-based data. In Proceedings of the IEEE International Conference on Big Data. IEEE, 5340–5349.Google Scholar
Thu-Lan Dam, Kenli Li, Philippe Fournier-Viger, and Quang-Huy Duong. 2017. An efficient algorithm for mining top- on-shelf high utility itemsets. Knowledge and Information Systems 52, 3 (2017), 621–655. Google ScholarDigital Library
Tai Dinh, Van-Nam Huynh, and Bac Le. 2017. Mining periodic high utility sequential patterns. In Proceedings of the Asian Conference on Intelligent Information and Database Systems. Springer, 545–555.Google ScholarCross Ref
Philippe Fournier-Viger, Jerry Chun-Wei Lin, Quang-Huy Duong, and Thu-Lan Dam. 2016. PHM: Mining periodic high-utility itemsets. In Proceedings of the Industrial Conference on Data Mining. Springer, 64–79.Google ScholarCross Ref
Philippe Fournier-Viger, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54–77.Google Scholar
Philippe Fournier-Viger, Jerry Chun-Wei Lin, Bay Vo, Tin Truong Chi, Ji Zhang, and Hoai Bac Le. 2017. A survey of itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 4 (2017), e1207.Google ScholarCross Ref
Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83–92.Google Scholar
Philippe Fournier-Viger, Yimin Zhang, Jerry Chun-Wei Lin, Hamido Fujita, and Yun Sing Koh. 2019. Mining local and peak high utility itemsets. Information Sciences 481 (2019), 344–367.Google ScholarDigital Library
Philippe Fournier-Viger and Souleymane Zida. 2015. FOSHU: Faster on-shelf high utility itemset mining–with or without negative unit profit. In Proceedings of the 30th Annual ACM Symposium on Applied Computing. 857–864. Google ScholarDigital Library
Wensheng Gan, Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent Tseng, and Philip S. Yu. 2021. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1306–1327.Google ScholarCross Ref
Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Athanasios V. Vasilakos, and S. Yu Philip. 2020. Utility-driven data analytics on uncertain data. IEEE Systems Journal 14, 3 (2020), 4442–4453.Google ScholarCross Ref
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. Mining recent high-utility patterns from temporal databases with time-sensitive constraint. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery. Springer, 3–18.Google ScholarCross Ref
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242. Google ScholarDigital Library
Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and S. Yu Philip. 2020. ProUM: Projection-based utility mining on sequence data. Information Sciences 513 (2020), 222–240.Google ScholarDigital Library
Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2021. Fast utility mining on sequence data. IEEE Transactions on Cybernetics 51, 2 (2021), 487–500.Google ScholarCross Ref
Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Hongzhi Yin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2021. Utility mining across multi-dimensional sequences. ACM Transactions on Knowledge Discovery from Data 15, 5 (2021), 1–25. Google ScholarDigital Library
Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Computing Surveys 38, 3 (2006), 9–es. Google ScholarDigital Library
Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and I-Wei Lin. 2014. Discovery of high utility sequential patterns with consideration of on-shelf time periods of products. In Proceedings of the 31st Workshop on Combinatorial Mathematics and Computation Theory. 250–255.Google Scholar
Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and Vincent S. Tseng. 2014. On-shelf utility mining with negative item values. Expert Systems with Applications 41, 7 (2014), 3450–3459. Google ScholarDigital Library
Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851–5857. Google ScholarDigital Library
Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowledge-Based Systems 96, C (2016), 171–187. Google ScholarDigital Library
Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2017. Efficiently mining uncertain high-utility itemsets. Soft Computing 21, 11 (2017), 2801–2820. Google ScholarDigital Library
Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Advanced Engineering Informatics 29, 3 (2015), 648–661. Google ScholarDigital Library
Ying Liu, Wei-Keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689–695. Google ScholarDigital Library
Alex Yuxuan Peng, Yun Sing Koh, and Patricia Riddle. 2017. mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 196–207.Google ScholarCross Ref
Anjali N. Radkar and S. S. Pawar. 2015. Mining high on-shelf utility itemsets with negative values from dynamic updated database. International Journal of Advanced Studies in Computers, Science and Engineering 4, 6 (2015), 27.Google Scholar
Bai-En Shie, Hui-Fang Hsiao, Vincent S. Tseng, and Philip S. Yu. 2011. Mining high utility mobile sequential patterns in mobile commerce environments. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 224–238. Google ScholarDigital Library
Tin Truong-Chi and Philippe Fournier-Viger. 2019. A survey of high utility sequential pattern mining. In High-Utility Pattern Mining. Springer, 97–129.Google Scholar
Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2012. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on knowledge and Data Engineering 25, 8 (2012), 1772–1786. Google ScholarDigital Library
Jun-Zhe Wang, Jiun-Long Huang, and Yi-Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowledge and Information Systems 49, 2 (2016), 597–627. Google ScholarDigital Library
Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies 90 (2018), 166–180.Google ScholarCross Ref
Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 660–668. Google ScholarDigital Library
Junfu Yin, Zhigang Zheng, Longbing Cao, Yin Song, and Wei Wei. 2013. Efficiently mining top- high utility sequential patterns. In Proceedings of the 13th International Conference on Data Mining. IEEE, 1259–1264.Google ScholarCross Ref
Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861–3878. Google ScholarDigital Library
Lili Zhang, Wenjie Wang, and Yuqing Zhang. 2019. Privacy preserving association rule mining: Taxonomy, techniques, and metrics. IEEE Access 7 (2019), 45032–45047.Google ScholarCross Ref
Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 530–546.Google Scholar
Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential patterns. BMC Systems Biology 11, 6 (2017), 109.Google ScholarCross Ref

Index Terms

On-Shelf Utility Mining of Sequence Data
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

On-shelf utility mining with negative item values

We introduce a new research work, on-shelf utility mining with negative item values.We propose a TS-HOUN algorithm for mining the new type of utility itemsets.The derived itemsets are expected to be more reliable in terms of business.The synthetic and ...
Read More
High average-utility itemsets mining: a survey
Abstract
HUIM (High utility itemsets mining) is a sub-division of data mining dealing with the task to obtain promising patterns in the quantitative datasets. A variant of HUIM is to discover the HAUIM (High average-utility itemsets mining) where average-...
Read More
Discovery of high utility itemsets from on-shelf time periods of products

Utility mining has recently been an emerging topic in the field of data mining. It finds out high utility itemsets by considering both the profits and quantities of items in transactions. It may have a bias if items are not always on shelf. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 16, Issue 2
April 2022
514 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3476120
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 July 2021
- Revised: 1 March 2021
- Accepted: 1 March 2021
- Received: 1 November 2020
Published in tkdd Volume 16, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
On-shelf utility mining
utility mining
sequence data
data mining
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 297
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

On-shelf utility mining with negative item values

High average-utility itemsets mining: a survey

Discovery of high utility itemsets from on-shelf time periods of products