当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Density Sketches for Sampling and Estimation
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-02-24 , DOI: arxiv-2102.12301
Aditya Desai, Benjamin Coleman, Anshumali Shrivastava

We introduce Density sketches (DS): a succinct online summary of the data distribution. DS can accurately estimate point wise probability density. Interestingly, DS also provides a capability to sample unseen novel data from the underlying data distribution. Thus, analogous to popular generative models, DS allows us to succinctly replace the real-data in almost all machine learning pipelines with synthetic examples drawn from the same distribution as the original data. However, unlike generative models, which do not have any statistical guarantees, DS leads to theoretically sound asymptotically converging consistent estimators of the underlying density function. Density sketches also have many appealing properties making them ideal for large-scale distributed applications. DS construction is an online algorithm. The sketches are additive, i.e., the sum of two sketches is the sketch of the combined data. These properties allow data to be collected from distributed sources, compressed into a density sketch, efficiently transmitted in the sketch form to a central server, merged, and re-sampled into a synthetic database for modeling applications. Thus, density sketches can potentially revolutionize how we store, communicate, and distribute data.

中文翻译:

用于采样和估计的密度草图

我们介绍了密度草图(DS):在线的数据分布摘要。DS可以准确估算逐点概率密度。有趣的是,DS还提供了从基础数据分布中采样看不见的新颖数据的功能。因此,类似于流行的生成模型,DS允许我们用与原始数据相同的分布图得出的综合示例,简洁地替换几乎所有机器学习管道中的实际数据。但是,与没有任何统计保证的生成模型不同,DS可以从理论上合理地渐近收敛底层密度函数的一致估计量。密度草图还具有许多吸引人的特性,使其非常适合大型分布式应用程序。DS构造是一种在线算法。草图是可加的,即 两个草图的总和是组合数据的草图。这些属性允许从分布式源收集数据,将数据压缩成密度草图,以草图形式有效地传输到中央服务器,合并并重新采样到用于建模应用程序的合成数据库中。因此,密度草图可能会彻底改变我们存储,通信和分发数据的方式。
更新日期:2021-02-25
down
wechat
bug