当前位置: X-MOL 学术ACM Trans. Database Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Flexible Skylines
ACM Transactions on Database Systems ( IF 2.2 ) Pub Date : 2020-12-10 , DOI: 10.1145/3406113
Paolo Ciaccia 1 , Davide Martinenghi 2

Skyline and ranking queries are two popular, alternative ways of discovering interesting data in large datasets. Skyline queries are simple to specify, as they just return the set of all non-dominated tuples, thereby providing an overall view of potentially interesting results. However, they are not equipped with any means to accommodate user preferences or to control the cardinality of the result set. Ranking queries adopt, instead, a specific scoring function to rank tuples, and can easily control the output size. While specifying a scoring function allows one to give different importance to different attributes by means of, e.g., weight parameters, choosing the “right” weights to use is known to be a hard problem. In this article, we embrace the skyline approach by introducing an original framework able to capture user preferences by means of constraints on the weights used in a scoring function, which is typically much easier than specifying precise weight values. To this end, we introduce the novel concept of F-dominance , i.e., dominance with respect to a family of scoring functions F : a tuple t is said to F -dominate tuple s when t is always better than or equal to s according to all the functions in F . Based on F -dominance, we present two flexible skyline (F-skyline) operators, both returning a subset of the skyline: nd , characterizing the set of non- F -dominated tuples; po , referring to the tuples that are also potentially optimal, i.e., best according to some function in F . While nd and po coincide and reduce to the traditional skyline when F is the family of all monotone scoring functions, their behaviors differ when subsets thereof are considered. We discuss the formal properties of these new operators, show how to implement them efficiently, and evaluate them on both synthetic and real datasets.



天际线和排名查询是在大型数据集中发现有趣数据的两种流行的替代方法。Skyline 查询很容易指定,因为它们只返回所有非支配元组的集合,从而提供潜在有趣结果的整体视图。但是,它们没有配备任何方法来适应用户偏好或控制结果集的基数。相反,排名查询采用特定的评分函数对元组进行排名,并且可以轻松控制输出大小。虽然指定评分函数允许人们通过例如权重参数对不同的属性赋予不同的重要性,但众所周知,选择要使用的“正确”权重是一个难题。在本文中,我们通过引入一个原始框架来采用天际线方法,该框架能够通过对评分函数中使用的权重的约束来捕获用户偏好,这通常比指定精确的权重值容易得多。为此,我们引入了新的概念F-优势,即相对于 a 的支配地位家庭评分函数F: 一个元组据说F-支配元组s什么时候总是优于或等于s根据全部中的函数F. 基于F- 优势,我们提出了两个灵活的天际线(F-skyline) 运算符,均返回天际线的子集:nd,表征一组非F- 支配元组;,指的是也可能是最优的元组,即,根据F. 尽管nd重合并减少到传统的天际线时F是所有单调评分函数的族,当考虑其子集时,它们的行为会有所不同。我们讨论了这些新运算符的形式属性,展示了如何有效地实现它们,并在合成数据集和真实数据集上评估它们。