Effective and Efficient Variable-Length Data Series Analytics,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Effective and Efficient Variable-Length Data Series Analytics
arXiv - CS - Databases Pub Date : 2020-09-22 , DOI: arxiv-2009.11648
Michele Linardi

In the last twenty years, data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different mining problems work by means of similarity search. In this regard, all the proposed solutions require the prior knowledge of the series length on which similarity search is performed. In several cases, the choice of the length is critical and sensibly influences the quality of the expected outcome. Unfortunately, the obvious brute-force solution, which provides an outcome for all lengths within a given range is computationally untenable. In this Ph.D. work, we present the first solutions that inherently support scalable and variable-length similarity search in data series, applied to sequence/subsequences matching, motif and discord discovery problems.The experimental results show that our approaches are up to orders of magnitude faster than the alternatives. They also demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed.

中文翻译：

有效且高效的可变长度数据系列分析

在过去的二十年中，数据系列相似性搜索已成为与数据系列集合相关的多个分析任务和应用程序核心的基本操作。许多不同挖掘问题的解决方案都通过相似性搜索来工作。在这方面，所有提出的解决方案都需要执行相似性搜索的序列长度的先验知识。在某些情况下，长度的选择至关重要，并且会明显影响预期结果的质量。不幸的是，明显的蛮力解决方案为给定范围内的所有长度提供结果在计算上站不住脚。在这个博士。工作，我们提出了第一个解决方案，这些解决方案本质上支持数据系列中的可扩展和可变长度的相似性搜索，应用于序列/子序列匹配，主题和不和谐发现问题。实验结果表明，我们的方法比替代方法快了几个数量级。他们还表明，我们可以消除使用预定义长度执行分析的不切实际的限制，从而产生更直观和可操作的结果，否则这些结果会被遗漏。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文