当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probabilistic Top-k Dominating Queries in Distributed Uncertain Databases
arXiv - CS - Databases Pub Date : 2021-05-10 , DOI: arxiv-2105.04486
Niranjan Rai, Xiang Lian

In many real-world applications such as business planning and sensor data monitoring, one important, yet challenging, the task is to rank objects(e.g., products, documents, or spatial objects) based on their ranking scores and efficiently return those objects with the highest scores. In practice, due to the unreliability of data sources, many real-world objects often contain noises and are thus imprecise and uncertain. In this paper, we study the problem of probabilistic top-k dominating(PTD) query on such large-scale uncertain data in a distributed environment, which retrieves k uncertain objects from distributed uncertain databases(on multiple distributed servers), having the largest ranking scores with high confidences. In order to efficiently tackle the distributed PTD problem, we propose a MapReduce framework for processing distributed PTD queries over distributed uncertain databases. In this MapReduce framework, we design effective pruning strategies to filter out false alarms in the distributed setting, propose cost-model-based index distribution mechanisms over servers, and develop efficient distributed PTD query processing algorithms. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed distributed PTD approach on both real and synthetic data sets through various experimental settings.

中文翻译:

分布式不确定数据库中的概率Top-k支配查询

在许多实际应用中,例如业务计划和传感器数据监视,一项重要但又具有挑战性的任务是根据对象的排名分数对对象(例如产品,文档或空间对象)进行排名,并通过有效地返回这些对象。最高分。实际上,由于数据源的不可靠性,许多现实世界中的对象经常包含噪声,因此不精确且不确定。本文研究了分布式环境中对此类大规模不确定数据的概率top-k支配(PTD)查询问题,该问题从分布不确定性数据库(多个分布式服务器)中检索k个不确定对象,具有最大的排名信心十足。为了有效解决分布式PTD问题,我们提出了一个MapReduce框架,用于处理分布式不确定数据库上的分布式PTD查询。在此MapReduce框架中,我们设计了有效的修剪策略,以过滤掉分布式设置中的虚假警报,提出了基于成本模型的服务器上索引分配机制,并开发了高效的分布式PTD查询处理算法。大量的实验已经证明了我们提出的分布式PTD方法通过各种实验设置在真实数据集和合成数据集上的效率和有效性。
更新日期:2021-05-11
down
wechat
bug