$(2+\epsilon)$-ANN for time series under the Fr\'echet distance,arXiv - CS - Computational Geometry

当前位置： X-MOL 学术 › arXiv.cs.CG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

$(2+\epsilon)$-ANN for time series under the Fr\'echet distance
arXiv - CS - Computational Geometry Pub Date : 2020-08-21 , DOI: arxiv-2008.09406
Anne Driemel, Ioannis Psarros

We give the first approximate-near-neighbor data structure for time series under the continuous Fr\'echet distance. Given a parameter $\epsilon \in (0,1]$, the data structure can be used to preprocess $n$ curves in $\mathbb{R}$ (aka time series), each of complexity $m$, to answer queries with a curve of complexity $k$ by either returning a curve that lies within Fr\'echet distance $2+\epsilon$, or answering that there exists no curve in the input within distance $1$. In both cases, the answer is correct. Our data structure uses space in $n\cdot \mathcal{O}\left({\epsilon^{-1}}\right)^{k} + \mathcal{O}(nm)$ and query time in $\mathcal{O}\left(k\right)$. We show that under some conditions the approximation factor achieved by our data structure is optimal in the cell-probe model of computation. Concretely, we show that for any data structure which achieves an approximation factor less than $2$ and which supports curves of arclength at most $L$, uses a word size bounded by $\mathcal{O}(L^{1-\epsilon})$ for some constant $\epsilon>0$, and answers the query using only a constant number of probes, the number of words used to store the data structure must be at least $L^{\Omega(k)}$. Our data structure uses only a constant number of probes per query and does not have any dependency on $L$. In particular, this shows that our solution is optimal if only a constant number of probes is allowed. Our second positive result is a probabilistic data structure based on locality-sensitive hashing, which achieves space in $\mathcal{O}(nm)$ and query time in $\mathcal{O}(k)$, and which answers queries with an approximation factor in $\mathcal{O}(k)$. Both of our data structures make use of the concept of signatures, which were originally introduced for the problem of clustering time series under the Fr\'echet distance.

中文翻译：

$(2+\epsilon)$-ANN 用于 Fr\'echet 距离下的时间序列

我们给出了连续Fr\'echet距离下时间序列的第一个近似近邻数据结构。给定参数 $\epsilon \in (0,1]$，数据结构可用于预处理 $\mathbb{R}$（又名时间序列）中的 $n$ 条曲线，每个复杂度为 $m$，以回答通过返回位于 Fr\'echet 距离 $2+\epsilon$ 内的曲线，或回答输入中在距离 $1$ 内不存在曲线，查询复杂度为 $k$ 的曲线。在这两种情况下，答案都是正确。我们的数据结构使用 $n\cdot \mathcal{O}\left({\epsilon^{-1}}\right)^{k} + \mathcal{O}(nm)$ 中的空间和查询时间$\mathcal{O}\left(k\right)$。我们表明，在某些条件下，我们的数据结构实现的近似因子在计算的细胞探针模型中是最佳的。具体来说，我们表明，对于任何接近因子小于 $2$ 并且支持弧长曲线至多 $L$ 的数据结构，使用由 $\mathcal{O}(L^{1-\epsilon}) 限定的字长$ 对于某些常量 $\epsilon>0$，并且仅使用常量数量的探针回答查询，用于存储数据结构的单词数必须至少为 $L^{\Omega(k)}$。我们的数据结构每个查询只使用固定数量的探针，并且不依赖于 $L$。特别是，这表明如果只允许固定数量的探针，我们的解决方案是最佳的。我们的第二个正面结果是基于局部敏感散列的概率数据结构，它在 $\mathcal{O}(nm)$ 中实现空间和在 $\mathcal{O}(k)$ 中实现查询时间，并用$\mathcal{O}(k)$ 中的近似因子。

更新日期：2020-11-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文