当前位置: X-MOL 学术Semant. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Continuous top-k approximated join of streaming and evolving distributed data
Semantic Web ( IF 3 ) Pub Date : 2019-10-17 , DOI: 10.3233/sw-190367
Shima Zahmatkesh 1 , Emanuele Della Valle 1
Affiliation  

Continuously finding the most relevant (shortly, top-k) answer of a query that joins streaming and distributed data is getting a growing attention. In recent years, this is in particular happening in Social Media and IoT. It is well known that, in those settings, remaining reactive can be challenging, because accessing the distributed data can be highly time consuming as well as rate-limited. In this paper, we investigate the problem of continuous top-k query evaluation over a data stream joined with a distributed dataset in even a more extreme situation: the distributed data evolves. We propose the Topk+N algorithm and the AcquaTop framework. They keep up to date a local replica of the distributed dataset and guarantees reactiveness by construction, but to do so they may need to approximate the result. Therefore, we propose two maintenance policies to update the replica: the Top Selection Maintenance (AT-TSM) policy maximizes the relevancy, while the Border Selection Maintenance (AT-BSM) policy maximizes the accuracy of the top-k result. We contribute a theoretical proof of the correctness of Topk+N algorithm and we study its complexity. Moreover, we provide empirical evidence that the proposed policies within AcquaTop framework produce more relevant and accurate results than the state of the art.

中文翻译:

流和不断发展的分布式数据的连续top-k近似联接

不断寻找连接流数据和分布式数据的查询中最相关(最短为top-k)的答案正受到越来越多的关注。近年来,在社交媒体和物联网中尤其如此。众所周知,在那些设置中,保持被动状态可能是一个挑战,因为访问分布式数据可能既耗时又受速率限制。在本文中,我们研究了在甚至更为极端的情况下:分布式数据不断演变的情况下,对与分布式数据集连接的数据流进行连续top-k查询评估的问题。我们提出了Topk + N算法和AcquaTop框架。他们会及时更新分布式数据集的本地副本,并通过构造来保证反应性,但是这样做可能需要近似结果。因此,我们提出了两种维护策略来更新副本:顶级选择维护(AT-TSM)策略使相关性最大化,而边界选择维护(AT-BSM)策略使前k个结果的准确性最大化。我们为Topk + N算法的正确性提供了理论证明,并研究了其复杂性。此外,我们提供的经验证据表明,AcquaTop框架内的拟议政策比最新技术产生了更多相关和准确的结果。
更新日期:2019-10-17
down
wechat
bug