Near-Optimal Coresets of Kernel Density Estimates

Phillips, Jeff M.; Tai, Wai Ming

doi:10.1007/s00454-019-00134-6

Near-Optimal Coresets of Kernel Density Estimates

Published: 25 September 2019

Volume 63, pages 867–887, (2020)
Cite this article

Discrete & Computational Geometry Aims and scope Submit manuscript

363 Accesses
12 Citations
Explore all metrics

Abstract

We construct near-optimal coresets for kernel density estimates for points in ${\mathbb {R}}^d$ when the kernel is positive definite. Specifically we provide a polynomial time construction for a coreset of size $O(\sqrt{d}/\varepsilon \cdot \sqrt{\log 1/\varepsilon } )$, and we show a near-matching lower bound of size $\Omega (\min \{\sqrt{d}/\varepsilon , 1/\varepsilon ^2\})$. When $d\ge 1/\varepsilon ^2$, it is known that the size of coreset can be $O(1/\varepsilon ^2)$. The upper bound is a polynomial-in-$(1/\varepsilon )$ improvement when $d \in [3,1/\varepsilon ^2)$ and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

Daniel Azagra, Marjorie Drake & Piotr Hajłasz

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Yurii Nesterov & Vladimir Spokoiny

Finding global minima via kernel approximations

Article 04 April 2024

Alessandro Rudi, Ulysse Marteau-Ferey & Francis Bach

Notes

This combines results published in SOCG 2018 [39] and SODA 2018 [38].

References

Arias-Castro, E., Mason, D., Pelletier, B.: On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. J. Mach. Learn. Res. 17, 43 (2016)
MATH MathSciNet Google Scholar
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Article MathSciNet MATH Google Scholar
Bach, F., Lacoste-Julien, S., Obozinski, G.: On the equivalence between herding and conditional gradient algorithms. In: Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML’12), pp. 1355–1362. Omnipress (2012)
Banaszczyk, W.: Balancing vectors and Gaussian measures of $n$-dimensional convex bodies. Random Struct. Algorithms 12(4), 351–360 (1998)
Article MathSciNet MATH Google Scholar
Bansal, N., Dadush, D., Garg, S., Lovett, S.: The Gram–Schmidt walk: a cure for the Banaszczyk blues (STOC’18). In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 587–597. ACM, New York (2018)
Bentley, J.L., Saxe, J.B.: Decomposable searching problems I: static-to-dynamic transformations. J. Algorithms 1, 4 (1980)
MATH MathSciNet Google Scholar
Bobrowski, O., Mukherjee, S., Taylor, J.E.: Topological consistency via kernel estimation. Bernoulli 23(1), 288–328 (2017)
Article MathSciNet MATH Google Scholar
Chazelle, B.: The Discrepancy Method. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Chazelle, B., Matoušek, J.: On linear-time deterministic algorithms for optimization problems in fixed dimensions. J. Algorithms 21(3), 579–597 (1996)
Article MathSciNet MATH Google Scholar
Chen, Y., Welling, M., Smola, A.: Super-samples from kernel hearding. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI’10), pp. 109–116. AUAI Press, Arlington (2010)
Clarkson, K.: Coresets, sparse greedy approximation, and the Frank–Wolfe algorithm. ACM Trans. Algorithms 4(6), 63 (2010)
MATH MathSciNet Google Scholar
Cortés, E.C., Scott, C.: Sparse approximation of a kernel mean. IEEE Trans. Signal Process. 65(5), 1310–1323 (2016)
Article MathSciNet MATH Google Scholar
Devroye, L., Györfi, L.: Nonparametric Density Estimation: The $L_1$ View. Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics. Wiley, New York (1985)
Google Scholar
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
MATH MathSciNet Google Scholar
Dunn, J.C.: Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5), 473–489 (1980)
Article MathSciNet MATH Google Scholar
Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability, vol. 66. Chapman & Hall, London (1996)
MATH Google Scholar
Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., Singh, A.: Confidence sets for persistence diagrams. Ann. Stat. 42(6), 2301–2339 (2014)
Article MathSciNet MATH Google Scholar
Freund, R.M., Grigas, P.: New analysis and results for the Frank–Wolfe method. Math. Program. 155(1–2), 199–230 (2016)
Article MathSciNet MATH Google Scholar
Gärtner, B., Jaggi, M.: Coresets for polytope distance. In: Proceedings of the 25th Annual Symposium on Computational Geometry (SCG’09), pp. 33–42. ACM, New York (2009)
Glaunès, J.: Transport par difféomorphismes de points, de mesures et de courants pour la comparaison de formes et l’anatomie numérique. PhD thesis, Université Paris 13 (2005)
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theoret. Comput. Sci. 38(2–3), 293–306 (1985)
Article MathSciNet MATH Google Scholar
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
MATH MathSciNet Google Scholar
Harvey, N., Samadi, S.: Near-optimal herding. In: Proceedings of the 27th Conference on Learning Theory vol. 35, pp. 1165–1183 (2014)
Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 136–143 (2005)
Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36(3), 1171–1220 (2008)
Article MathSciNet MATH Google Scholar
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning, vol 28(1), pp. 427–435 (2013)
Jaggi, M., Lacoste-Julien, S.: On the global linear convergence of Frank–Wolfe optimization variants. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Joshi, S., Kommaraji, R.V., Phillips, J.M., Venkatasubramanian, S.: Comparing distributions and shapes using the kernel distance. In: Proceedings of the 27th Annual Symposium on Computational Geometry (SoCG’11), pp. 47–56. ACM, New York (2011)
Lacoste-Julien, S., Lindsten, F., Bach, F.: Sequential kernel herding: Frank–Wolfe optimization for particle filtering. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, pp. 544–552 (2015)
Li, Y., Long, P.M., Srinivasan, A.: Improved bounds on the samples complexity of learning. J. Comput. Syst. Sci. 62(3), 516–527 (2001)
Article MathSciNet MATH Google Scholar
Lopaz-Paz, D., Muandet, K., Schölkopf, B., Tolstikhin, I.: Towards a learning theory of cause-effect inference. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1452–1461 (2015)
Matoušek, J.: Geometric Discrepancy: An Illustrated Guide. Algorithms and Combinatorics, vol. 18, 2nd edn. Springer, Berlin (2010)
MATH Google Scholar
Matoušek, J., Nikolov, A., Talwar, K.: Factorization norms and hereditary discrepancy. Int. Math. Res. Not. https://doi.org/10.1093/imrn/rny033
Muandet, K., Fukumizu, K., Sriperumbudur, B.K., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10, 1–141 (2017)
Article MATH Google Scholar
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Article MathSciNet MATH Google Scholar
Phillips, J.M.: Algorithms for $\varepsilon $-approximations of terrains. In: ICALP (2008)
Phillips, J.M.: $\varepsilon $-Samples for kernels. In: Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’13), pp. 1622–1632. SIAM, Philadelphia (2013)
Phillips, J.M., Tai, W.M.: Improved coresets for kernel density estimates. In: Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’18), pp. 2718–2727. SIAM, Philadelphia (2018)
Phillips, J.M., Tai, W.M.: Near-optimal coresets for kernel density estimates. In: Proceedings 34th International Symposium on Computational Geometry (SoCG’18), pp. 66:1–66:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2018)
Phillips, J.M., Venkatasubramanian, S.: A gentle introduction to the kernel distance. arXiv:1103.1625 (2011)
Phillips, J.M., Wang, B., Zheng, Y.: Geometric inference on kernel density estimates. In: Proceedings 31th International Symposium on Computational Geometry (SoCG’15). Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2015)
Rinaldo, A., Wasserman, L.: Generalized density clustering. Ann. Stat. 38(5), 2678–2722 (2010)
Article MathSciNet MATH Google Scholar
Schoenberg, I.J.: Metric spaces and completely monotone functions. Ann. Math. 39(4), 811–841 (1938)
Article MathSciNet MATH Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Schubert, E., Zimek, A., Kriegel, H.P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the SIAM International Conference on Data Mining, pp. 542–550 (2014)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)
Book MATH Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, London (1986)
Book MATH Google Scholar
Song, L., Zhang, X., Smola, A., Gretton, A., Schölkopf, B.: Tailoring density estimation via reproducing kernel moment matching. In: Proceedings of the 25th International Conference on Machine Learning (ICML’08), pp. 992–999. ACM, New York (2008)
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.G.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11, 1517–1561 (2010)
MATH MathSciNet Google Scholar
Wahba, G.: Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In: Advances in Kernel Methods—Support Vector Learning, pp. 69–88. MIT Press, Cambridge (1999)
Zheng, Y., Phillips, J.M.: L$_{\infty }$ error and bandwidth selection for kernel density estimates of large data. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15), pp. 1533–1542. ACM, New York (2015)

Download references

Author information

Authors and Affiliations

University of Utah, Salt Lake City, USA
Jeff M. Phillips & Wai Ming Tai

Authors

Jeff M. Phillips
View author publications
You can also search for this author in PubMed Google Scholar
Wai Ming Tai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wai Ming Tai.

Additional information

Editor in Charge: Kenneth Clarkson

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J.M. Phillips thanks the support by NSF CCF-1350888, IIS-1251019, ACI-1443046, CNS-1514520, and CNS-1564287.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phillips, J.M., Tai, W.M. Near-Optimal Coresets of Kernel Density Estimates. Discrete Comput Geom 63, 867–887 (2020). https://doi.org/10.1007/s00454-019-00134-6

Download citation

Received: 22 June 2018
Revised: 14 August 2019
Accepted: 27 August 2019
Published: 25 September 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00454-019-00134-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Near-Optimal Coresets of Kernel Density Estimates

Abstract

Access this article

Similar content being viewed by others

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Random Gradient-Free Minimization of Convex Functions

Finding global minima via kernel approximations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Near-Optimal Coresets of Kernel Density Estimates

Abstract

Access this article

Similar content being viewed by others

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Random Gradient-Free Minimization of Convex Functions

Finding global minima via kernel approximations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation