当前位置: X-MOL 学术ACM SIGMOD Rec. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data Preparation
ACM SIGMOD Record ( IF 0.9 ) Pub Date : 2020-12-17 , DOI: 10.1145/3444831.3444835
Mazhar Hameed 1 , Felix Naumann 1
Affiliation  

Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day. To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques.

中文翻译:

数据准备

原始数据通常很混乱:它们遵循不同的编码、记录的结构不完善、值不符合模式等。这些数据通常不适合被下游应用程序(例如数据分析工具)甚至数据提取管理系统。从原始数据中获取信息的行为依赖于一些数据准备过程。数据准备是高级数据分析和数据管理不可或缺的一部分,不仅适用于数据科学,也适用于任何数据驱动的应用程序。现有的数据准备工具是可操作且有用的,但仍有改进和优化的空间。随着数据量的增加及其混乱的性质,对准备好的数据的需求日益增加。为了满足这一需求,公司和研究人员正在开发数据准备技术和工具。为了更好地了解可用的数据准备系统,我们进行了一项调查,以调查 (1) 突出的数据准备工具,(2) 独特的工具功能,(3) 即使对于这些工具也需要进行初步数据处理,以及 (4) 功能和仍然缺乏的能力。我们以支持超越传统和简单技术的自动和智能数据准备的论点结束。
更新日期:2020-12-17
down
wechat
bug