当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cleaning data with Llunatic
The VLDB Journal ( IF 4.2 ) Pub Date : 2019-11-08 , DOI: 10.1007/s00778-019-00586-5
Floris Geerts , Giansalvatore Mecca , Paolo Papotti , Donatello Santoro

Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper, we develop a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In Llunatic, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, Llunatic is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality.

中文翻译:

使用Llunatic清洁数据

在许多与数据库相关的任务中,数据清理(或数据修复)被认为是至关重要的问题。它包括使数据库与一组给定的约束保持一致。近年来,针对几种约束条件提出了修复方法。但是,这些方法倾向于对策略进行硬编码以修复冲突的值,并且专门针对特定类别的约束。在本文中,我们开发了一个通用的基于追逐的修复框架,称为Llunatic,其中可以针对一大类约束条件并使用不同的策略选择首选值来进行维修。该框架基于带标签实例和部分排序的首选项标签的优雅形式化。在这种情况下,我们重新审视了升级,维修和追逐等概念。在Llunatic中,可以采用各种修复策略,而无需更改基础实现。此外,Llunatic是第一个基于DBMS的数据修复系统。我们报告了证实其良好的可扩展性的实验结果,并显示了框架的各种实例化导致了良好的修复。
更新日期:2019-11-08
down
wechat
bug