skip to main content
research-article

An Empirical Study of Moment Estimators for Quantile Approximation

Published:18 March 2021Publication History
Skip Abstract Section

Abstract

We empirically evaluate lightweight moment estimators for the single-pass quantile approximation problem, including maximum entropy methods and orthogonal series with Fourier, Cosine, Legendre, Chebyshev and Hermite basis functions. We show how to apply stable summation formulas to offset numerical precision issues for higher-order moments, leading to reliable single-pass moment estimators up to order 15. Additionally, we provide an algorithm for GPU-accelerated quantile approximation based on parallel tree reduction. Experiments evaluate the accuracy and runtime of moment estimators against the state-of-the-art KLL quantile estimator on 14,072 real-world datasets drawn from the OpenML database. Our analysis highlights the effectiveness of variants of moment-based quantile approximation for highly space efficient summaries: their average performance using as few as five sample moments can approach the performance of a KLL sketch containing 500 elements. Experiments also illustrate the difficulty of applying the method reliably and showcases which moment-based approximations can be expected to fail or perform poorly.

References

  1. Naum I. Akhiezer. 1965. The Classical Moment Problem: And Some Related Questions in Analysis. Vol. 5. Oliver & Boyd, Edinburgh, UK.Google ScholarGoogle Scholar
  2. Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Frank Hutter, Michel Lang, Rafael G. Mantovani, Jan N. van Rijn, and Joaquin Vanschoren. 2017. OpenML Benchmarking Suites. arxiv:stat.ML/1708.03731Google ScholarGoogle Scholar
  3. Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. 1973. Time bounds for selection. Journal of Computer and System Sciences 7, 4 (Aug. 1973), 448--461. DOI:https://doi.org/10.1016/S0022-0000(73)80033-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nikolai N. Cencov. 1962. Estimation of an unknown distribution density from observations. Soviet Mathematics 3 (1962), 1559--1566.Google ScholarGoogle Scholar
  5. Tony F. Chan, Gene H. Golub, and Randall J. LeVeque. 1983. Algorithms for computing the sample variance: Analysis and recommendations. American Statistician 37, 3 (1983), 242--247. http://www.jstor.org/stable/2683386Google ScholarGoogle ScholarCross RefCross Ref
  6. Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages, and Programming (ICALP’02). 693--703. http://dl.acm.org/citation.cfm?id--646255.684566.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. John Cheng, Max Grossman, and Ty McKercher. 2014. Professional CUDA C Programming. John Wiley & Sons, Indianapolis, IN.Google ScholarGoogle Scholar
  8. Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jiu Ding, Noah H. Rhee, and Chenhua Zhang. 2016. On polynomial maximum entropy method for classical moment problem. Advances in Applied Mathematics and Mechanics 8, 1 (2016), 117--127. DOI:https://doi.org/10.4208/aamm.2014.m504Google ScholarGoogle ScholarCross RefCross Ref
  10. Ted Dunning and Otmar Ertl. 2019. Computing extremely accurate quantiles using t-digests. arxiv:stat.CO/1902.04023Google ScholarGoogle Scholar
  11. Sam Efromovich. 2010. Orthogonal series density estimation. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 467--476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Message P Forum. 1994. MPI: A Message-Passing Interface Standard. Technical Report. MPI Forum, Knoxville, TN, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, and Peter Bailis. 2018. Moment-based quantile sketches for efficient high cardinality aggregation queries. Proceedings of the VLDB Endowment 11, 11 (2018), 1647--1660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Greenwald and Sanjeev Khanna. 2001. Space-efficient online computation of quantile summaries. ACM SIGMOD Record 30, 2 (May 2001), 58--66. DOI:https://doi.org/10.1145/376284.375670Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nicholas J. Higham. 1993. The accuracy of floating point summation. SIAM Journal on Scientific Computing 14 (1993), 783--799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Edwin T. Jaynes. 1957. Information theory and statistical mechanics. Physical Review 106, 4 (1957), 620.Google ScholarGoogle ScholarCross RefCross Ref
  17. Z. Karnin, K. Lang, and E. Liberty. 2016. Optimal quantile approximation in streams. In Proceedings of the 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS’16). IEEE, Los Alamitos, CA, 71--78.Google ScholarGoogle Scholar
  18. Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Computing 38, 3 (2012), 157--174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Solomon Kullback. 1997. Information Theory and Statistics. Dover Publications, Mineola, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ge Luo, Lu Wang, Ke Yi, and Graham Cormode. 2016. Quantiles over data streams: Experimental comparisons, new analyses, and further improvements. VLDB Journal 25, 4 (Aug. 2016), 449--472. DOI:https://doi.org/10.1007/s00778-016-0424-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John C. Mason and David C. Handscomb. 2002. Chebyshev Polynomials. Chapman & Hall.Google ScholarGoogle Scholar
  22. Charles Masson, Jee E. Rim, and Homin K. Lee. 2019. DDSketch. Proceedings of the VLDB Endowment 12, 12 (Aug. 2019), 2195--2205. DOI:https://doi.org/10.14778/3352063.3352135Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Ian Munro and Mike S. Paterson. 1980. Selection and sorting with limited storage. Theoretical Computer Science 12, 3 (1980), 315--323.Google ScholarGoogle ScholarCross RefCross Ref
  24. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with CUDA. Queue 6, 2 (March 2008), 40--53. DOI:https://doi.org/10.1145/1365490.1365500Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. CUDA Nvidia. 2011. Nvidia Cuda C programming guide. Nvidia Corporation 120, 18 (2011), 8.Google ScholarGoogle Scholar
  26. Philippe Pébay, Timothy B. Terriberry, Hemanth Kolla, and Janine Bennett. 2016. Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights. Computational Statistics 31, 4 (2016), 1305--1325.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007. Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University Press, Cambridge, UK.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Carl Runge. 1901. Über empirische Funktionen und die Interpolation zwischen äquidistanten Ordinaten. Zeitschrift für Mathematik und Physik 46, 224–243 (1901), 20.Google ScholarGoogle Scholar
  29. Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, and Subhash Suri. 2004. Medians and beyond: New aggregation techniques for sensor networks. In Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys’04). ACM, New York, NY, 239--249. DOI:https://doi.org/10.1145/1031495.1031524Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael Stephanou, Melvin Varughese, Iain Macdonald, et al. 2017. Sequential quantiles via Hermite series density estimation. Electronic Journal of Statistics 11, 1 (2017), 570--607.Google ScholarGoogle ScholarCross RefCross Ref
  31. Edward A. Youngs and Elliot M. Cramer. 1971. Some results relevant to choice of sum and sum-of-product algorithms. Technometrics 13, 3 (1971), 657--665.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Empirical Study of Moment Estimators for Quantile Approximation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 46, Issue 1
            March 2021
            143 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/3457891
            Issue’s Table of Contents

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 March 2021
            • Accepted: 1 December 2020
            • Revised: 1 November 2020
            • Received: 1 December 2019
            Published in tods Volume 46, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format