当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tight Lower Bound for Comparison-Based Quantile Summaries
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-05-09 , DOI: arxiv-1905.03838
Graham Cormode and Pavel Vesel\'y

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most $\varepsilon$. That is, an $\varepsilon$-approximate quantile summary first processes a stream of items and then, given any quantile query $0\le \phi\le 1$, returns an item from the stream, which is a $\phi'$-quantile for some $\phi' = \phi \pm \varepsilon$. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most $O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)$ items, where $N$ is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space $f(\varepsilon)\cdot o(\log N)$, for any function $f$ that does not depend on $N$. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of $(1\pm \varepsilon)\cdot \phi$, and for other related computational tasks.

中文翻译:

基于比较的分位数汇总的严格下限

分位数(例如中位数或百分位数)提供关于从完全有序的宇宙中提取的项目集合分布的简明有用信息。我们研究称为分位数汇总的数据结构,它跟踪所有分位数,最多可达到 $\varepsilon$ 的误差。也就是说,$\varepsilon$-近似分位数摘要首先处理项目流,然后,给定任何分位数查询 $0\le\phi\le 1$,从流中返回一个项目,即 $\phi'$ - 某些 $\phi' = \phi \pm \varepsilon$ 的分位数。我们专注于基于比较的分位数摘要,这些摘要只能比较两个项目,否则完全忽略整个宇宙。迄今为止最好的确定性分位数总结,归功于 Greenwald 和 Khanna (SIGMOD '01),最多存储 $O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)$ 项,其中 $N$ 是流中的项数。我们通过显示匹配的下限来证明这个空间界限是最优的。因此,我们的结果排除了在空间 $f(\varepsilon)\cdot o(\log N)$ 中构建基于确定性比较的分位数汇总的可能性,对于任何不依赖于 $N$ 的函数 $f$。作为推论,我们改进了有偏分位数的下限,这为 $(1\pm\varepsilon)\cdot\phi$ 和其他相关计算任务提供了更强的相对误差保证。因此,我们的结果排除了在空间 $f(\varepsilon)\cdot o(\log N)$ 中构建基于确定性比较的分位数汇总的可能性,对于任何不依赖于 $N$ 的函数 $f$。作为推论,我们改进了有偏分位数的下限,这为 $(1\pm\varepsilon)\cdot\phi$ 和其他相关计算任务提供了更强的相对误差保证。因此,我们的结果排除了在空间 $f(\varepsilon)\cdot o(\log N)$ 中构建基于确定性比较的分位数汇总的可能性,对于任何不依赖于 $N$ 的函数 $f$。作为推论,我们改进了有偏分位数的下限,这为 $(1\pm\varepsilon)\cdot\phi$ 和其他相关计算任务提供了更强的相对误差保证。
更新日期:2020-01-17
down
wechat
bug