The VLDB Journal ( IF 2.904 ) Pub Date : 2019-11-08 , DOI: 10.1007/s00778-019-00586-5 Floris Geerts, Giansalvatore Mecca, Paolo Papotti, Donatello Santoro
Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper, we develop a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In Llunatic, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, Llunatic is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality.