当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable Multiway Stream Joins in Hardware
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-12-01 , DOI: 10.1109/tkde.2019.2916860
Mohammadreza Najafi , Mohammad Sadoghi , Hans-Arno Jacobsen

Efficient real-time analytics are an integral part of an increasing number of data management applications, such as computational targeted advertising, algorithmic trading, and Internet of Things. In this paper, we focus primarily on accelerating stream joins, which are arguably one of the most commonly used and resource-intensive operators in stream processing. We propose a scalable circular pipeline design ($\sf{ Circular\text{-}MJ}$Circular-MJ) in hardware to orchestrate a multiway join while minimizing data flow disruption. In this circular design, each new tuple (given its origin stream) starts its processing from a specific join core and passes through all respective join cores in a pipeline sequence to produce the final results. We also present a novel two-stage pipeline stream join ($\sf{ Stashed\text{-}MJ}$Stashed-MJ) that uses a best-effort buffering technique (referred to as stash) to maintain intermediate results. If an overwrite is detected in the stash, our design automatically resorts to recomputing intermediate results. Finally, we present a parallelized version of our multiway stream join by integrating our proposed pipelines into a parallel unidirectional flow-based architecture ($\sf{ Parallel\text{-}MJ}$Parallel-MJ). Our experimental results demonstrate a linear throughput scaling with respect to the numbers of streams and processing cores.

中文翻译:

硬件中的可扩展多路流连接

高效的实时分析是越来越多的数据管理应用程序不可或缺的一部分,例如计算定向广告、算法交易和物联网。在本文中,我们主要关注加速流连接,这可以说是流处理中最常用和资源密集型的运算符之一。我们提出了一种可扩展的圆形管道设计($\sf{ Circular\text{-}MJ}$——兆焦) 在硬件中编排多路连接,同时最大限度地减少数据流中断。在这种循环设计中,每个新元组(给定其原始流)从特定的连接核心开始其处理,并在管道序列中通过所有相应的连接核心以产生最终结果。我们还提出了一种新颖的两级管道流连接($\sf{ 藏匿\文本{-}MJ}$藏匿——兆焦) 使用尽力而为的缓冲技术(称为 stash)来维护中间结果。如果在存储中检测到覆盖,我们的设计会自动重新计算中间结果。最后,我们通过将我们提出的管道集成到基于并行单向流的架构中来展示我们的多路流连接的并行化版本($\sf{ 平行\文本{-}MJ}$平行——兆焦)。我们的实验结果证明了与流和处理核心数量相关的线性吞吐量缩放。
更新日期:2020-12-01
down
wechat
bug