An efficient algorithm for approximated self-similarity joins in metric spaces,Information Systems

当前位置： X-MOL 学术 › Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An efficient algorithm for approximated self-similarity joins in metric spaces
Information Systems ( IF 3.7 ) Pub Date : 2020-02-24 , DOI: 10.1016/j.is.2020.101510
Sebastián Ferrada , Benjamin Bustos , Nora Reyes

Similarity join is a key operation in metric databases. It retrieves all pairs of elements that are similar. Solving such a problem usually requires comparing every pair of objects of the datasets, even when indexing and ad hoc algorithms are used. We propose a simple and efficient algorithm for the computation of the approximated $k$ nearest neighbor self-similarity join. This algorithm computes $Θ (n^{3 ∕ 2})$ distances and it is empirically shown that it reaches an empirical precision of 46% in real-world datasets. We provide a comparison to other common techniques such as Quickjoin and Locality-Sensitive Hashing and argue that our proposal has a better execution time and average precision.

中文翻译：

度量空间中近似自相似联接的有效算法

相似性联接是度量标准数据库中的关键操作。它检索所有相似的元素对。解决此问题通常需要比较数据集的每对对象，即使使用索引和即席算法也是如此。我们提出了一种简单有效的算法来计算近似值 $ķ$ 最近邻居自相似联接。该算法计算 $Θ （ ñ^{3 ∕ 2} ）$ 距离，并根据经验表明，它在实际数据集中达到46％的经验精度。我们提供了与其他常见技术（如快速联接和局部敏感哈希）的比较，并认为我们的建议具有更好的执行时间和平均精度。

更新日期：2020-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>