Towards Approximate Query Enumeration with Sublinear Preprocessing Time,arXiv - CS - Logic in Computer Science

当前位置： X-MOL 学术 › arXiv.cs.LO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Approximate Query Enumeration with Sublinear Preprocessing Time
arXiv - CS - Logic in Computer Science Pub Date : 2021-01-15 , DOI: arxiv-2101.06240
Isolde Adler, Polly Fahey

This paper aims at providing extremely efficient algorithms for approximate query enumeration on sparse databases, that come with performance and accuracy guarantees. We introduce a new model for approximate query enumeration on classes of relational databases of bounded degree. We first prove that on databases of bounded degree any local first-order definable query can be enumerated approximately with constant delay after a constant time preprocessing phase. We extend this, showing that on databases of bounded tree-width and bounded degree, every query that is expressible in first-order logic can be enumerated approximately with constant delay after a sublinear (more precisely, polylogarithmic) time preprocessing phase. Durand and Grandjean (ACM Transactions on Computational Logic 2007) proved that exact enumeration of first-order queries on databases of bounded degree can be done with constant delay after a linear time preprocessing phase. Hence we achieve a significant speed-up in the preprocessing phase. Since sublinear running time does not allow reading the whole input database even once, sacrificing some accuracy is inevitable for our speed-up. Nevertheless, our enumeration algorithms come with guarantees: With high probability, (1) only tuples are enumerated that are answers to the query or `close' to being answers to the query, and (2) if the proportion of tuples that are answers to the query is sufficiently large, then all answers will be enumerated. Here the notion of `closeness' is a tuple edit distance in the input database. For local first-order queries, only actual answers are enumerated, strengthening (1). Moreover, both the `closeness' and the proportion required in (2) are controllable. We combine methods from property testing of bounded degree graphs with logic and query enumeration, which we believe can inspire further research.

中文翻译：

具有亚线性预处理时间的近似查询枚举

本文旨在为稀疏数据库上的近似查询枚举提供极其有效的算法，并提供性能和准确性保证。我们引入了一种新的模型，用于对有界度的关系数据库类进行近似查询枚举。我们首先证明，在一定时间的预处理阶段之后，在有限度的数据库上，可以近似地以恒定的延迟枚举任何本地一阶可定义查询。我们对此进行了扩展，表明在有界树宽和有界度的数据库上，在亚线性（更准确地说是对数）时间预处理阶段之后，可以用恒定延迟近似枚举以一阶逻辑表示的每个查询。Durand和Grandjean（ACM Transactions on Computational Logic 2007）证明，在线性时间预处理阶段之后，可以在具有恒定延迟的情况下对有界度数据库上的一阶查询进行精确枚举。因此，我们在预处理阶段实现了显着的加速。由于亚线性运行时间甚至不允许一次读取整个输入数据库，因此牺牲某些精度对于我们的提速是不可避免的。但是，我们的枚举算法具有以下保证：（1）仅枚举作为查询答案的元组或“接近”作为查询答案的元组，以及（2）如果作为查询答案的元组的比例查询足够大，那么所有答案都将被枚举。这里的“紧密度”是输入数据库中元组的编辑距离。对于本地一阶查询，仅列举实际答案，从而加强（1）。而且，（2）中所要求的“紧密度”和比例都是可控的。我们将有限度图的属性测试中的方法与逻辑和查询枚举相结合，我们相信这些方法可以激发进一步的研究。

更新日期：2021-01-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>