当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PatchIndex: exploiting approximate constraints in distributed databases
Distributed and Parallel Databases ( IF 1.2 ) Pub Date : 2021-03-06 , DOI: 10.1007/s10619-021-07326-1
Steffen Kläbe , Kai-Uwe Sattler , Stephan Baumann

Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying strict constraints. Automatic schema optimization in self-managing databases is difficult in these environments without prior data cleaning steps. In this paper, we focus on constraint discovery as a subtask of schema optimization. Perfect constraints might not exist in these unclean datasets due to a small set of values violating the constraints. Therefore, we introduce the concept of a generic PatchIndex structure, which handles exceptions to given constraints and enables database systems to define these approximate constraints. We apply the concept to the environment of distributed databases, providing parallel index creation approaches and optimization techniques for parallel queries using PatchIndexes. Furthermore, we describe heuristics for automatic discovery of PatchIndex candidate columns and prove the performance benefit of using PatchIndexes in our evaluation.



中文翻译:

PatchIndex:利用分布式数据库中的近似约束

云数据仓库系统降低了访问数据分析的障碍。这些应用程序通常缺少数据库管理员,无法集成来自各种来源的数据,从而可能导致数据无法满足严格的约束。在没有事先数据清理步骤的情况下,在这些环境中很难在自我管理数据库中进行自动模式优化。在本文中,我们将约束发现作为模式优化的子任务。由于一小部分违反约束的值,因此在这些不干净的数据集中可能不存在完美约束。因此,我们引入了通用的PatchIndex结构的概念,该结构处理给定约束的异常并使数据库系统能够定义这些近似约束。我们将这一概念应用于分布式数据库的环境中,提供使用PatchIndexes进行并行查询的并行索引创建方法和优化技术。此外,我们描述了自动发现PatchIndex候选列的启发式方法,并证明了在我们的评估中使用PatchIndexes的性能优势。

更新日期:2021-03-07
down
wechat
bug