Efficient k -dominant skyline query over incomplete data using MapReduce,Frontiers of Computer Science

当前位置： X-MOL 学术 › Front. Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient k -dominant skyline query over incomplete data using MapReduce
Frontiers of Computer Science ( IF 3.4 ) Pub Date : 2021-04-16 , DOI: 10.1007/s11704-020-0122-x
Linlin Ding , Shu Wang , Baoyan Song

Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects. Sometimes, a skyline query may return so many results because it cannot control the retrieval conditions especially for high-dimensional datasets. As an extension of skyline query, the k-dominant skyline query reduces the control of the dimension by controlling the value of the parameter k to achieve the purpose of reducing the retrieval objects. In addition, with the continuous promotion of Bigdata applications, the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure, no power of battery, accidental loss, so that the data might be incomplete with missing values in some attributes. Obviously, the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared. Meanwhile, the existing algorithms are unsuitable for directly used to the incomplete big data. Based on the above situations, this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment. First, we propose an index structure over incomplete data, named incomplete data index based on dominate hierarchical tree (ID-DHT). Applying the bucket strategy, the incomplete data is divided into different buckets according to the dimensions of missing attributes. Second, we also put forward query algorithm for incomplete data in MapReduce environment, named MapReduce incomplete data based on dominant hierarchical tree algorithm (MR-ID-DHTA). The data in the bucket is allocated to the subspace according to the dominant condition by Map function. Reduce function controls the data according to the key value and returns the k-dominant skyline query result. The effective experiments demonstrate the validity and usability of our index structure and the algorithm.

中文翻译：

使用MapReduce对不完整数据进行有效的k主导天际线查询

通过过滤不感兴趣的数据对象，天际线查询已广泛地集成到各种现实应用程序中。有时，天际线查询可能返回很多结果，因为它无法控制检索条件，尤其是对于高维数据集。作为天际线查询的扩展，k主导天际线查询通过控制参数k的值来减少对维度的控制，以达到减少检索对象的目的。另外，随着Bigdata应用程序的不断推广，由于交付失败，电池电量不足，意外丢失等实际原因，我们获取的数据可能不具有人们想要的全部内容，因此数据可能不完整，缺少值在某些属性上。显然，不完全数据的k占主导地位的天际线查询算法在某种程度上取决于用户定义，并且结果无法共享。同时，现有算法不适合直接用于不完整的大数据。基于上述情况，本文主要研究k不完整数据集的主要天际线查询问题，并将该问题与MapReduce环境之类的分布式结构结合在一起。首先，我们提出了一种基于不完全数据的索引结构，即基于优势层次树（ID-DHT）的不完全数据索引。应用存储桶策略，根据缺失属性的维度，将不完整的数据划分为不同的存储桶。其次，我们还提出了基于MapReduce环境的不完整数据查询算法，即基于显性层次树算法（MR-ID-DHTA）的MapReduce不完整数据查询算法。通过Map函数根据主导条件将存储桶中的数据分配给子空间。Reduce函数根据键值控制数据并返回k-主要的天际线查询结果。有效的实验证明了我们的索引结构和算法的有效性和实用性。

更新日期：2021-04-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11