当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BestNeighbor: efficient evaluation of kNN queries on large time series databases
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2020-11-16 , DOI: 10.1007/s10115-020-01518-4
Oleksandra Levchenko , Boyan Kolev , Djamel-Edine Yagoubi , Reza Akbarinia , Florent Masseglia , Themis Palpanas , Dennis Shasha , Patrick Valduriez

This paper presents parallel solutions (developed based on two state-of-the-art algorithms iSAX and sketch) for evaluating k nearest neighbor queries on large databases of time series, compares them based on various measures of quality and time performance, and offers a tool that uses the characteristics of application data to determine which algorithm to choose for that application and how to set the parameters for that algorithm. Specifically, our experiments show that: (i) iSAX and its derivatives perform best in both time and quality when the time series can be characterized by a few low-frequency Fourier Coefficients, a regime where the iSAX pruning approach works well. (ii) iSAX performs significantly less well when high-frequency Fourier Coefficients have much of the energy of the time series. (iii) A random projection approach based on sketches by contrast is more or less independent of the frequency power spectrum. The experiments show the close relationship between pruning ratio and time for exact iSAX as well as between pruning ratio and the quality of approximate iSAX. Our toolkit analyzes typical time series of an application (i) to determine optimal segment sizes for iSAX and (ii) when to use Parallel Sketches instead of iSAX. Our algorithms have been implemented using Spark, evaluated over a cluster of nodes, and have been applied to both real and synthetic data. The results apply to any databases of numerical sequences, whether or not they relate to time.



中文翻译:

BestNeighbor:在大型时间序列数据库上高效评估kNN查询

本文提出了并行解决方案(基于两种最先进的算法iSAX和sketch开发),用于评估大型时间序列数据库上的k个最近邻居查询,并根据各种质量和时间性能指标对它们进行比较,并提供了该工具使用应用程序数据的特征来确定为该应用程序选择哪种算法以及如何为该算法设置参数。具体而言,我们的实验表明:(i)当时间序列可以通过一些低频傅里叶系数来表征时,iSAX及其衍生物在时间和质量上均表现最佳,iSAX修剪方法可以很好地发挥这种作用。(ii)当高频傅立叶系数具有时间序列的大部分能量时,iSAX的性能将大大降低。(iii)基于草图的对比随机投影方法或多或少与频率功率谱无关。实验表明,精确iSAX的修剪率和时间之间的密切关系以及修剪率和近似iSAX的质量之间的密切关系。我们的工具包分析了应用程序的典型时间序列(i)确定iSAX的最佳段大小,以及(ii)何时使用Parallel Sketchs代替iSAX。我们的算法已使用Spark实施,在节点集群上进行了评估,并已应用于真实数据和合成数据。结果适用于任何数字序列数据库,无论它们是否与时间有关。实验表明,精确iSAX的修剪率和时间之间的密切关系以及修剪率和近似iSAX的质量之间的密切关系。我们的工具包分析了应用程序的典型时间序列(i)确定iSAX的最佳段大小,以及(ii)何时使用平行草图代替iSAX。我们的算法已使用Spark实施,在节点集群上进行了评估,并已应用于真实数据和合成数据。结果适用于任何数字序列数据库,无论它们是否与时间有关。实验表明,精确iSAX的修剪率和时间之间的密切关系以及修剪率和近似iSAX的质量之间的密切关系。我们的工具包分析了应用程序的典型时间序列(i)确定iSAX的最佳段大小,以及(ii)何时使用Parallel Sketchs代替iSAX。我们的算法已使用Spark实施,在节点集群上进行了评估,并已应用于真实数据和合成数据。结果适用于任何数字序列数据库,无论它们是否与时间有关。在一组节点上进行评估,并已应用于真实数据和合成数据。结果适用于任何数字序列数据库,无论它们是否与时间有关。在一组节点上进行评估,并已应用于实际和综合数据。结果适用于任何数字序列数据库,无论它们是否与时间有关。

更新日期:2020-11-16
down
wechat
bug