当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A memory-optimal many-to-many semi-stream join
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2018-08-31 , DOI: 10.1007/s10619-018-7247-z
M. Asif Naeem , Gerald Weber , Christof Lutteroth

Semi-stream join algorithms join a fast stream input with a disk-based master data relation. A common class of these algorithms is derived from hash joins: they use the stream as build input for a main hash table, and also include a cache for frequent master data. The composition of the cache is very important for performance; however, the decision of which master data to cache has so far been solely based on heuristics. We present the first formal criterion, a cache inequality that leads to a provably optimal composition of the cache in a semi-stream many-to-many equijoin algorithm. We propose a novel algorithm, Semi-Stream Balanced Join (SSBJ), which exploits this cache inequality to achieve a given service rate with a provably minimal amount of memory for all stream distributions. We present a cost model for SSBJ and compare its service rate empirically and analytically with other related approaches.

中文翻译:

内存优化的多对多半流连接

半流连接算法将快速流输入与基于磁盘的主数据关系连接起来。这些算法的一个常见类别来自散列连接:它们使用流作为主散列表的构建输入,并且还包括用于频繁主数据的缓存。缓存的组成对性能非常重要;然而,到目前为止,决定缓存哪些主数据完全是基于启发式的。我们提出了第一个正式标准,一个缓存不等式,它导致在半流多对多等值连接算法中缓存的最佳组合。我们提出了一种新颖的算法,半流平衡连接 (SSBJ),它利用这种缓存不等式来实现给定的服务率,并为所有流分布提供可证明的最小内存量。
更新日期:2018-08-31
down
wechat
bug