Abstract
Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS+, to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS+ has wider real-life applications owing to its high efficiency.
- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Vol. 1215, 487–499. Google ScholarDigital Library
- Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 7th International Conference on Data Engineering. IEEE, 3–14. Google ScholarDigital Library
- Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong-Soo Jeong. 2010. Mining high utility web access sequences in dynamic web log data. In Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, 76–81. Google ScholarDigital Library
- Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong-Soo Jeong. 2010. A novel approach for mining high-utility sequential patterns in sequence databases. ETRI Journal 32, 5 (2010), 676–686.Google ScholarCross Ref
- Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708–1721. Google ScholarDigital Library
- Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2645–2657. Google ScholarDigital Library
- Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu. 2002. Sequential pattern mining using a bitmap representation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 429–435. Google ScholarDigital Library
- Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE Computer Society, 19–19.Google ScholarCross Ref
- Jiahui Chen, Xu Guo, Wensheng Gan, Chien-Ming Chen, Weiping Ding, and Guoting Chen. 2020. OSUMI: On-shelf utility mining from itemset-based data. In Proceedings of the IEEE International Conference on Big Data. IEEE, 5340–5349.Google Scholar
- Thu-Lan Dam, Kenli Li, Philippe Fournier-Viger, and Quang-Huy Duong. 2017. An efficient algorithm for mining top- on-shelf high utility itemsets. Knowledge and Information Systems 52, 3 (2017), 621–655. Google ScholarDigital Library
- Tai Dinh, Van-Nam Huynh, and Bac Le. 2017. Mining periodic high utility sequential patterns. In Proceedings of the Asian Conference on Intelligent Information and Database Systems. Springer, 545–555.Google ScholarCross Ref
- Philippe Fournier-Viger, Jerry Chun-Wei Lin, Quang-Huy Duong, and Thu-Lan Dam. 2016. PHM: Mining periodic high-utility itemsets. In Proceedings of the Industrial Conference on Data Mining. Springer, 64–79.Google ScholarCross Ref
- Philippe Fournier-Viger, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54–77.Google Scholar
- Philippe Fournier-Viger, Jerry Chun-Wei Lin, Bay Vo, Tin Truong Chi, Ji Zhang, and Hoai Bac Le. 2017. A survey of itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 4 (2017), e1207.Google ScholarCross Ref
- Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83–92.Google Scholar
- Philippe Fournier-Viger, Yimin Zhang, Jerry Chun-Wei Lin, Hamido Fujita, and Yun Sing Koh. 2019. Mining local and peak high utility itemsets. Information Sciences 481 (2019), 344–367.Google ScholarDigital Library
- Philippe Fournier-Viger and Souleymane Zida. 2015. FOSHU: Faster on-shelf high utility itemset mining–with or without negative unit profit. In Proceedings of the 30th Annual ACM Symposium on Applied Computing. 857–864. Google ScholarDigital Library
- Wensheng Gan, Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent Tseng, and Philip S. Yu. 2021. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1306–1327.Google ScholarCross Ref
- Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, Athanasios V. Vasilakos, and S. Yu Philip. 2020. Utility-driven data analytics on uncertain data. IEEE Systems Journal 14, 3 (2020), 4442–4453.Google ScholarCross Ref
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, and Han-Chieh Chao. 2016. Mining recent high-utility patterns from temporal databases with time-sensitive constraint. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery. Springer, 3–18.Google ScholarCross Ref
- Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242. Google ScholarDigital Library
- Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and S. Yu Philip. 2020. ProUM: Projection-based utility mining on sequence data. Information Sciences 513 (2020), 222–240.Google ScholarDigital Library
- Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2021. Fast utility mining on sequence data. IEEE Transactions on Cybernetics 51, 2 (2021), 487–500.Google ScholarCross Ref
- Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Hongzhi Yin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2021. Utility mining across multi-dimensional sequences. ACM Transactions on Knowledge Discovery from Data 15, 5 (2021), 1–25. Google ScholarDigital Library
- Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. Computing Surveys 38, 3 (2006), 9–es. Google ScholarDigital Library
- Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and I-Wei Lin. 2014. Discovery of high utility sequential patterns with consideration of on-shelf time periods of products. In Proceedings of the 31st Workshop on Combinatorial Mathematics and Computation Theory. 250–255.Google Scholar
- Guo-Cheng Lan, Tzung-Pei Hong, Jen-Peng Huang, and Vincent S. Tseng. 2014. On-shelf utility mining with negative item values. Expert Systems with Applications 41, 7 (2014), 3450–3459. Google ScholarDigital Library
- Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851–5857. Google ScholarDigital Library
- Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowledge-Based Systems 96, C (2016), 171–187. Google ScholarDigital Library
- Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S. Tseng. 2017. Efficiently mining uncertain high-utility itemsets. Soft Computing 21, 11 (2017), 2801–2820. Google ScholarDigital Library
- Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Advanced Engineering Informatics 29, 3 (2015), 648–661. Google ScholarDigital Library
- Ying Liu, Wei-Keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689–695. Google ScholarDigital Library
- Alex Yuxuan Peng, Yun Sing Koh, and Patricia Riddle. 2017. mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 196–207.Google ScholarCross Ref
- Anjali N. Radkar and S. S. Pawar. 2015. Mining high on-shelf utility itemsets with negative values from dynamic updated database. International Journal of Advanced Studies in Computers, Science and Engineering 4, 6 (2015), 27.Google Scholar
- Bai-En Shie, Hui-Fang Hsiao, Vincent S. Tseng, and Philip S. Yu. 2011. Mining high utility mobile sequential patterns in mobile commerce environments. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 224–238. Google ScholarDigital Library
- Tin Truong-Chi and Philippe Fournier-Viger. 2019. A survey of high utility sequential pattern mining. In High-Utility Pattern Mining. Springer, 97–129.Google Scholar
- Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2012. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on knowledge and Data Engineering 25, 8 (2012), 1772–1786. Google ScholarDigital Library
- Jun-Zhe Wang, Jiun-Long Huang, and Yi-Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowledge and Information Systems 49, 2 (2016), 597–627. Google ScholarDigital Library
- Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies 90 (2018), 166–180.Google ScholarCross Ref
- Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 660–668. Google ScholarDigital Library
- Junfu Yin, Zhigang Zheng, Longbing Cao, Yin Song, and Wei Wei. 2013. Efficiently mining top- high utility sequential patterns. In Proceedings of the 13th International Conference on Data Mining. IEEE, 1259–1264.Google ScholarCross Ref
- Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861–3878. Google ScholarDigital Library
- Lili Zhang, Wenjie Wang, and Yuqing Zhang. 2019. Privacy preserving association rule mining: Taxonomy, techniques, and metrics. IEEE Access 7 (2019), 45032–45047.Google ScholarCross Ref
- Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence. Springer, 530–546.Google Scholar
- Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential patterns. BMC Systems Biology 11, 6 (2017), 109.Google ScholarCross Ref
Index Terms
- On-Shelf Utility Mining of Sequence Data
Recommendations
On-shelf utility mining with negative item values
We introduce a new research work, on-shelf utility mining with negative item values.We propose a TS-HOUN algorithm for mining the new type of utility itemsets.The derived itemsets are expected to be more reliable in terms of business.The synthetic and ...
High average-utility itemsets mining: a survey
AbstractHUIM (High utility itemsets mining) is a sub-division of data mining dealing with the task to obtain promising patterns in the quantitative datasets. A variant of HUIM is to discover the HAUIM (High average-utility itemsets mining) where average-...
Discovery of high utility itemsets from on-shelf time periods of products
Utility mining has recently been an emerging topic in the field of data mining. It finds out high utility itemsets by considering both the profits and quantities of items in transactions. It may have a bias if items are not always on shelf. In this ...
Comments