Skip to main content
Log in

Efficient k-dominant skyline query over incomplete data using MapReduce

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects. Sometimes, a skyline query may return so many results because it cannot control the retrieval conditions especially for high-dimensional datasets. As an extension of skyline query, the k-dominant skyline query reduces the control of the dimension by controlling the value of the parameter k to achieve the purpose of reducing the retrieval objects. In addition, with the continuous promotion of Bigdata applications, the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure, no power of battery, accidental loss, so that the data might be incomplete with missing values in some attributes. Obviously, the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared. Meanwhile, the existing algorithms are unsuitable for directly used to the incomplete big data. Based on the above situations, this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment. First, we propose an index structure over incomplete data, named incomplete data index based on dominate hierarchical tree (ID-DHT). Applying the bucket strategy, the incomplete data is divided into different buckets according to the dimensions of missing attributes. Second, we also put forward query algorithm for incomplete data in MapReduce environment, named MapReduce incomplete data based on dominant hierarchical tree algorithm (MR-ID-DHTA). The data in the bucket is allocated to the subspace according to the dominant condition by Map function. Reduce function controls the data according to the key value and returns the k-dominant skyline query result. The effective experiments demonstrate the validity and usability of our index structure and the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Miao X Y, Gao Y J, Chen G, Zhang T Y. K-dominant skyline queries on incomplete data. Information Sciences, 2016, 1: 990–1011

    Article  Google Scholar 

  2. Wang Y, Shi Z, Wang J, Sun L F, Song B Y. Skyline preference query based on massive and incomplete dataset. IEEE Access, 2017, 1: 3183–3192

    Article  Google Scholar 

  3. Zeng Y F, Li K L, Yu S, Zhou Y T, Li K Q. Parallel and progressive approaches for skyline query over probabilistic incomplete database. IEEE Access, 2018, 1: 13289–13301

    Article  Google Scholar 

  4. Chan C Y, Jagadish H V, Tan K L, Tung K H A, Zhang Z J. Finding k-dominant skylines in high dimensional space. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2006, 503–514

  5. Siddique M A, Morimoto Y. K-dominant skyline computation by using sorting-filtering method. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 839–848

  6. Siddique M A, Morimoto Y. Efficient maintenance of k-dominant skyline for frequently updated database. In: Proceedings of International Conference on Advances in Databases, Knowledge and Data Applications. 2010, 107–110

  7. Siddique M A, Tian H, Morimoto Y. K-dominant skyline query computation in MapReduce environment. IEICE Transactions on Information and Systems, 2015, 98(5): 1027–1034

    Article  Google Scholar 

  8. Dong L G, Cui X W, Wang Z F, Cheng S W. Finding k-dominant skyline cube based on sharing-strategy. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery. 2010, 1694–1698

  9. Awasthi A, Bhattacharya A, Gupta S, Singh U H. K-dominant skyline join queries: extending the join paradigm to k-dominant skylines. In: Proceedings of the 33rd IEEE International Conference on Data Engineering. 2017, 99–102

  10. Huang J M, Xin J C, Wang G R, Li M. Efficient k-dominant skyline processing in wireless sensor networks. In: Proceedings of the 9th International Conference on Hybrid Intelligent Systems. 2009, 289–294

  11. Park C S, Jang S M, Yoo J S. An energy-efficient method for processing a k-dominant skyline query in wireless sensor networks. Transactions on Communications, 2013, 96(7): 1857–1864

    Google Scholar 

  12. Gao Y J, Miao X Y, Cui H Y, Chen G, Li Q. Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data. Expert System, 2014, 41(10): 4959–4974

    Article  Google Scholar 

  13. Gulzar Y, Alwan A A, Salleh N, Shaikhli I F A, Alvi S I M. A framework for evaluating skyline queries over incomplete data. In: Proceedings of International Conference on Mobile Systems and Pervasive Computing. 2016, 191–198

  14. Miao X Y, Gao Y J, Guo S, Chen L, Yin J W, Li Q. Answering skyline queries over incomplete data with crowdsourcing. In: Proceedings of International Conference on Data Engineering. 2020, 2032–2033

  15. Miao X Y, Gao Y J, Zheng B H, Chen G, Cui H Y. Top-k dominating queries on incomplete data. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 252–266

    Article  Google Scholar 

  16. Miao X Y, Gao Y J, Chen G, Zheng B H, Cui H Y. Processing incomplete k nearest neighbor search. IEEE Transactions on Fuzzy Systems, 2016, 24(6): 1349–1363

    Article  Google Scholar 

  17. Zhang K Q, Gao H, Wang H Z, Li J Z. ISSA: efficient skyline computation for incomplete data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2016, 321–328

  18. Zeng Y F, Li K L, Yu S, Zhou Y T, Li K Q. Parallel and progressive approaches for skyline query over probabilistic incomplete database. IEEE Access, 2018, 1: 13289–13301

    Article  Google Scholar 

  19. Zhang K Q, Gao H, Han X X, Cai Z P, Li J Z. Probabilistic skyline on incomplete data. In: Proceedings of ACM International Conference on Information and Knowledge Management. 2017, 427–436

  20. Ali A A, Hamidah I, Nur I U, Fatimah S. Processing skyline queries in incomplete distributed databases. Journal Intelligent Information Systems, 2016, 48(2): 399–420

    Google Scholar 

  21. Wang H Z, Yin S J, Sun M, Wang Y E, Wang H P, Li J Z, Gao H. Efficient computation of skyline queries on incomplete dynamic data. IEEE Access, 2018, 1: 52741–52753

    Article  Google Scholar 

  22. Li B Y, Cheng Y R, Yuan Y, Wang G R, Chen L. Three-dimensional stable matching problem for spatial crowdsourcing platforms. In: Proceedings of ACM Conference on Knowledge Discovery and Data Mining. 2019, 1643–1653

  23. Mullesgaad K, Pederseny J L, Lu H, Zhou Y L. Efficient skyline computation in MapReduce. In: Proceedings of International Conference on Extending Database Technology. 2014, 37–48

  24. Li Y Y, Qu W Y, Li Z Y, Xu Y J, Ji C Q, Wu J F. Parallel dynamic skyline query using MapReduce. In: Proceedings of International Conference on Cloud Computing and Big Data. 2014, 95–100

  25. Zhang J, Jiang X F, Ku W S, Qin X. Efficient parallel skyline evaluation using MapReduce. IEEE Transactions on Parallel & Distributed Systems, 2016, 27(7): 1996–2009

    Article  Google Scholar 

  26. Wang W L, Zhang J, Sun M T, Ku W S. Efficient parallel spatial skyline evaluation using MapReduce. In: Proceedings of International Conference on Extending Database Technology. 2017, 426–437

  27. Park Y, Min J K, Shim K. Efficient processing of skyline queries using MapReduce. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(5): 1031–1044

    Article  Google Scholar 

  28. Kim J S, Kim M H. An efficient parallel processing method for skyline queries in MapReduce. The Journal of Supercomputing, 2018, 74(2): 886–935

    Article  Google Scholar 

  29. Jang M Y, Song Y H, Chang J W. A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce. Distributed and Parallel Databases, 2017, 35(3–4): 383–409

    Article  Google Scholar 

  30. Siddique M A, Tian H, Morimoto Y. Distributed skyline computation of vertically split databases by using MapReduce. In: Proceedings of International Conference on Database Systems for Advanced Applications. 2014, 33–45

  31. Chen L, Kuang L, Wu J. MapReduce based skyline services selection for QoS-aware composition. In: Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium. 2012, 2035–2042

  32. Ding L, Wang G R, Xin J C, Yuan Y. Efficient probabilistic skyline query processing in MapReduce. In: Proceedings of IEEE International Congress on Big Data. 2013, 203–210

  33. Song B Y, Liu A L, Ding L L. Efficient top-k skyline computation in MaprReduce. In: Proceedings of IEEE International Workshop on Wireless Sensor. 2015, 67–70

  34. Ding L L, Zhang X, Sun M X, Liu A L, Song B Y. Efficient user preferences-based top-k skyline using MapReduce. In: Proceedings of International Conference of Pioneering Computer Scientists, Engineers and Educators. 2018, 74–87

  35. Zaman A, Siddique M A, Annisa, Morimoto Y. Selecting key person of social network using skyline query in MapReduce framework. In: Proceedings of International Symposium on Computing and Networking. 2015, 213–219

  36. Chen L, Hang K, Wu J. MapReduce skyline query processing with a new angular partitioning approach. In: Proceedings of IEEE International Parallel & Distributed Processing Symposium. 2012, 2262–2270

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62072220, 61802160, 61502215), China Postdoctoral Science Foundation Funded Project (2020M672134), Science Research Fund of Liaoning Province Education Department (LJC201913), Doctor Research Start-up Fund of Liaoning Province (20180540106).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baoyan Song.

Additional information

Linlin Ding received her MSc and PhD degrees in computer science and technology from the Northeastern University, China in 2008 and 2013, respectively. She is currently an associate professor in School of Information, Liaoning University, China. Her research interests include big data management, graph data management. She has published more than 40 research papers in the international conference proceedings and journals.

Shu Wang received her BSc in computer science and technology from Liaoning University, China 2019. She is currently a master student in School of Information, Liaoning University, China. Her research areas are big data management and skyline query.

Baoyan Song received her BSc, MSc and PhD degrees in Computer Science from the Northeastern University, China in 1988, 1996 and 2002 respectively. She is currently a full professor in the School of Information, Liaoning University, China. Her research interests include big data management, data stream management and graph data management. She has published more than 100 research papers in the international conference proceedings and journals.

Supporting Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Wang, S. & Song, B. Efficient k-dominant skyline query over incomplete data using MapReduce. Front. Comput. Sci. 15, 154611 (2021). https://doi.org/10.1007/s11704-020-0122-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-0122-x

Keywords

Navigation