当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discovering and merging related analytic datasets
Information Systems ( IF 3.0 ) Pub Date : 2020-01-17 , DOI: 10.1016/j.is.2020.101495
Rutian Liu , Eric Simon , Bernd Amann , Stéphane Gançarski

The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, despite the profusion of available datasets, it remains quite difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. This article describes a model and algorithms that exploit automatically extracted and user-defined semantic relationships for extending analytic datasets with new atomic or aggregated attribute values. Our framework is implemented as a REST service in SAP HANA and includes a careful theoretical analysis and practical solutions for several complex data quality issues.



中文翻译:

发现和合并相关的分析数据集

解析数据集的产生是一个重要的大数据趋势,已经远远超出了传统的IT管理的数据集开发范围。现在,数据科学家和数据分析师使用大数据框架和敏捷数据准备工具来创建分析数据集。但是,尽管可用数据集泛滥成灾,但数据分析师仍然很难从手头的数据集开始并使用来自其他现有数据集的其他属性对其进行自定义。本文介绍了一种模型和算法,这些模型和算法利用自动提取的和用户定义的语义关系来扩展具有新原子或聚合属性值的分析数据集。

更新日期:2020-01-17
down
wechat
bug