当前位置: X-MOL 学术arXiv.cs.IT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Binarized Johnson-Lindenstrauss embeddings
arXiv - CS - Information Theory Pub Date : 2020-09-17 , DOI: arxiv-2009.08320
Sjoerd Dirksen and Alexander Stollenwerk

We consider the problem of encoding a set of vectors into a minimal number of bits while preserving information on their Euclidean geometry. We show that this task can be accomplished by applying a Johnson-Lindenstrauss embedding and subsequently binarizing each vector by comparing each entry of the vector to a uniformly random threshold. Using this simple construction we produce two encodings of a dataset such that one can query Euclidean information for a pair of points using a small number of bit operations up to a desired additive error - Euclidean distances in the first case and inner products and squared Euclidean distances in the second. In the latter case, each point is encoded in near-linear time. The number of bits required for these encodings is quantified in terms of two natural complexity parameters of the dataset - its covering numbers and localized Gaussian complexity - and shown to be near-optimal.

中文翻译:

二值化 Johnson-Lindenstrauss 嵌入

我们考虑将一组向量编码为最少位数的问题,同时保留有关其欧几里得几何的信息。我们表明,可以通过应用 Johnson-Lindenstrauss 嵌入并随后通过将向量的每个条目与均匀随机阈值进行比较来对每个向量进行二值化来完成此任务。使用这种简单的构造,我们可以生成数据集的两种编码,这样就可以使用少量位操作来查询一对点的欧几里德信息,直至达到所需的加法误差 - 第一种情况下的欧几里德距离以及内积和平方欧几里得距离在第二。在后一种情况下,每个点都以近线性时间编码。
更新日期:2020-09-18
down
wechat
bug