当前位置:
X-MOL 学术
›
arXiv.cs.DB
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A Lightweight Algorithm to Uncover Deep Relationships in Data Tables
arXiv - CS - Databases Pub Date : 2020-09-07 , DOI: arxiv-2009.03358 Jin Cao and Yibo Zhao and Linjun Zhang and Jason Li
arXiv - CS - Databases Pub Date : 2020-09-07 , DOI: arxiv-2009.03358 Jin Cao and Yibo Zhao and Linjun Zhang and Jason Li
Many data we collect today are in tabular form, with rows as records and
columns as attributes associated with each record. Understanding the structural
relationship in tabular data can greatly facilitate the data science process.
Traditionally, much of this relational information is stored in table schema
and maintained by its creators, usually domain experts. In this paper, we
develop automated methods to uncover deep relationships in a single data table
without expert or domain knowledge. Our method can decompose a data table into
layers of smaller tables, revealing its deep structure. The key to our approach
is a computationally lightweight forward addition algorithm that we developed
to recursively extract the functional dependencies between table columns that
are scalable to tables with many columns. With our solution, data scientists
will be provided with automatically generated, data-driven insights when
exploring new data sets.
中文翻译:
揭示数据表中深层关系的轻量级算法
我们今天收集的许多数据都是表格形式,行作为记录,列作为与每条记录相关联的属性。了解表格数据中的结构关系可以极大地促进数据科学过程。传统上,这种关系信息的大部分存储在表模式中,并由其创建者(通常是领域专家)维护。在本文中,我们开发了自动化方法来在没有专家或领域知识的情况下发现单个数据表中的深层关系。我们的方法可以将数据表分解为更小的表层,揭示其深层结构。我们方法的关键是我们开发的一种计算轻量级的前向加法算法,用于递归提取表列之间的函数依赖关系,这些列可扩展到具有多列的表。通过我们的解决方案,
更新日期:2020-09-09
中文翻译:
揭示数据表中深层关系的轻量级算法
我们今天收集的许多数据都是表格形式,行作为记录,列作为与每条记录相关联的属性。了解表格数据中的结构关系可以极大地促进数据科学过程。传统上,这种关系信息的大部分存储在表模式中,并由其创建者(通常是领域专家)维护。在本文中,我们开发了自动化方法来在没有专家或领域知识的情况下发现单个数据表中的深层关系。我们的方法可以将数据表分解为更小的表层,揭示其深层结构。我们方法的关键是我们开发的一种计算轻量级的前向加法算法,用于递归提取表列之间的函数依赖关系,这些列可扩展到具有多列的表。通过我们的解决方案,