当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient computation of deletion-robust k -coverage queries
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2021-01-13 , DOI: 10.1007/s10115-020-01540-6
Jiping Zheng , Xingnan Huang , Yuan Ma

Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.



中文翻译:

有效计算删除健壮的k覆盖查询

从大型数据集中提取可控子集,使用户可以完全理解整个数据集,这是多准则决策的重要课题。近年来,对该问题进行了广泛研究,并提出了各种查询模型,例如top- k,skyline,k- regret和k- coverage查询。在这些模型中,k覆盖率查询是一种理想的查询方法。该模型具有稳定性,尺度不变性和高遍历效率。但是,当前的方法(包括k个覆盖率查询)无法处理从数据集中删除某些点的问题,同时还提供了有效的有效解决方案集。在本文中,我们研究了k的鲁棒性涉及动态删除数据点的两种情况下的-coverage查询。第一种情况是假设可以预先获取整个数据集,而第二种情况是当数据点到达流中时。对于集中式数据集,我们引入了一种筛选机制,并使用预先计算的阈值从整个数据集中过滤出一个核心集。然后,可以在这个小的核心集而不是整个数据集上进行k覆盖率查询,我们提出了基于阈值的k-coverage查询算法,可大大加快查询处理速度。对于流数据集,采用特殊的链结构。此外,提出了一种称为鲁棒筛分的单程流算法。此外,扩展了基于核集的方法来解决该问题。另外,在这两种情况下,采用采样技术来加速查询处理。大量实验验证了我们提出的鲁棒筛分算法和基于或不基于采样的基于核集的算法的有效性。

更新日期:2021-01-13
down
wechat
bug