当前位置: X-MOL 学术arXiv.cs.CG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Joins using Compact Data Structures
arXiv - CS - Computational Geometry Pub Date : 2019-08-05 , DOI: arxiv-1908.01812
Gonzalo Navarro and Juan L. Reutter and Javiel Rojas-Ledesma

Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B$+$-trees. Either way, this means spending an extra amount of storage space that may be non-negligible. We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quad tree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal.

中文翻译:

使用紧凑数据结构的最佳连接

最坏情况最优连接算法在数据库文献中获得了很多关注。我们现在计算了几种在最坏情况下最佳的算法,其中许多已经在实践中实现和验证。然而,这些算法的实现通常需要增强的索引结构:为了实现最优,我们要么需要构建全新的索引,要么必须使用多个索引实例化(例如 B$+$-trees)来填充数据库。无论哪种方式,这都意味着要花费额外的存储空间,这可能是不可忽略的。我们表明,可以直接从将关系视为可变维网格中的点集的表示中获得最佳算法,而无需额外的存储。我们的表示是静态索引的紧凑四叉树,和一个动态四叉树共享子树(我们称之为 qdag)用于中间结果。我们开发了一种组合算法来处理这种表示下的全连接查询,并表明该算法的运行时间在数据复杂度方面是最坏情况下的最优。值得注意的是,我们可以通过引入 qdags (lqdags) 的惰性版本来扩展我们的框架,以评估关系代数中更具表现力的查询。再一次,我们可以证明我们算法的运行时间是最坏情况下的最优。我们可以通过引入 qdags (lqdags) 的惰性版本来扩展我们的框架,以评估关系代数中更具表现力的查询。再一次,我们可以证明我们算法的运行时间是最坏情况下的最优。我们可以通过引入 qdags (lqdags) 的惰性版本来扩展我们的框架,以评估关系代数中更具表现力的查询。再一次,我们可以证明我们算法的运行时间是最坏情况下的最优。
更新日期:2020-01-10
down
wechat
bug