当前位置: X-MOL 学术ACM Trans. Database Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Empirical Study of Moment Estimators for Quantile Approximation
ACM Transactions on Database Systems ( IF 2.2 ) Pub Date : 2021-03-18 , DOI: 10.1145/3442337
Rory Mitchell 1 , Eibe Frank 2 , Geoffrey Holmes 2
Affiliation  

We empirically evaluate lightweight moment estimators for the single-pass quantile approximation problem, including maximum entropy methods and orthogonal series with Fourier, Cosine, Legendre, Chebyshev and Hermite basis functions. We show how to apply stable summation formulas to offset numerical precision issues for higher-order moments, leading to reliable single-pass moment estimators up to order 15. Additionally, we provide an algorithm for GPU-accelerated quantile approximation based on parallel tree reduction. Experiments evaluate the accuracy and runtime of moment estimators against the state-of-the-art KLL quantile estimator on 14,072 real-world datasets drawn from the OpenML database. Our analysis highlights the effectiveness of variants of moment-based quantile approximation for highly space efficient summaries: their average performance using as few as five sample moments can approach the performance of a KLL sketch containing 500 elements. Experiments also illustrate the difficulty of applying the method reliably and showcases which moment-based approximations can be expected to fail or perform poorly.

中文翻译:

分位数逼近矩估计量的实证研究

我们凭经验评估单遍分位数逼近问题的轻量级矩估计器,包括最大熵方法和具有傅里叶、余弦、勒让德、切比雪夫和 Hermite 基函数的正交级数。我们展示了如何应用稳定的求和公式来抵消高阶矩的数值精度问题,从而产生高达 15 阶的可靠单通道矩估计器。此外,我们提供了一种基于并行树缩减的 GPU 加速分位数近似算法。实验在从 OpenML 数据库中提取的 14,072 个真实世界数据集上针对最先进的 KLL 分位数估计器评估矩估计器的准确性和运行时间。我们的分析强调了基于矩的分位数近似的变体对于高空间效率摘要的有效性:它们使用最少五个样本矩的平均性能可以接近包含 500 个元素的 KLL 草图的性能。实验还说明了可靠地应用该方法的难度,并展示了哪些基于矩的近似可能会失败或表现不佳。
更新日期:2021-03-18
down
wechat
bug