当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Approaches to k Representative G-Skyline Queries
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2020-07-06 , DOI: 10.1145/3397503
Xu Zhou 1 , Kenli Li 1 , Zhibang Yang 2 , Yunjun Gao 3 , Keqin Li 4
Affiliation  

The G-Skyline (GSky) query is a powerful tool to analyze optimal groups in decision support. Compared with other group skyline queries, it releases users from providing an aggregate function. Besides, it can get much comprehensive results without overlooking some important results containing non-skylines. However, it is hard for the users to make sensible choices when facing so many results the GSky query returns, especially over a large, high-dimensional dataset or with a large group size. In this article, we investigate k representative G-Skyline ( k GSky) queries to obtain a manageable size of optimal groups. The k GSky query can also inherit the advantage of the GSky query; its results are representative and diversified. Next, we propose three exact algorithms with novel techniques including an upper bound pruning, a grouping strategy, a layered optimum strategy, and a hybrid strategy to efficiently process the k GSky query. Consider these exact algorithms have high time complexity and the precise results are not necessary in many applications. We further develop two approximate algorithms to trade off some accuracy for efficiency. Extensive experiments on both real and synthetic datasets demonstrate the efficiency, scalability, and accuracy of the proposed algorithms.

中文翻译:

k 代表性 G-Skyline 查询的有效方法

G-Skyline (GSky) 查询是分析决策支持中最佳组的强大工具。与其他群组天际线查询相比,它释放了用户提供聚合功能。此外,它可以获得非常全面的结果,而不会忽略一些包含非天际线的重要结果。然而,当 GSky 查询返回的结果如此之多时,用户很难做出明智的选择,尤其是在大型、高维数据集或大型组的情况下。在本文中,我们调查ķ代表 G-Skyline (ķGSky)查询以获得可管理大小的最佳组。这ķGSky查询也可以继承GSky查询的优点;其结果具有代表性和多样化。接下来,我们提出了三种具有新技术的精确算法,包括上限剪枝、分组策略、分层优化策略和混合策略,以有效地处理ķGSky 查询。考虑到这些精确算法具有很高的时间复杂度,并且在许多应用中不需要精确的结果。我们进一步开发了两种近似算法,以牺牲一些准确性来换取效率。在真实和合成数据集上的大量实验证明了所提出算法的效率、可扩展性和准确性。
更新日期:2020-07-06
down
wechat
bug