当前位置: X-MOL 学术Inf. Syst. Front. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Approach to Extracting Topic-guided Views from the Sources of a Data Lake
Information Systems Frontiers ( IF 6.9 ) Pub Date : 2020-05-24 , DOI: 10.1007/s10796-020-10010-x
Claudia Diamantini , Paolo Lo Giudice , Domenico Potena , Emanuele Storti , Domenico Ursino

In the last years, data lakes are emerging as an effective and an efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. In this scenario, one of the most challenging issues to address consists in the extraction of topic-guided (i.e., thematic) views from the (very heterogeneous and often unstructured) sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structuring” unstructured data. Finally, we define a technique to extract topic-guided views from the sources of a data lake, based on similarity and other semantic relationships among source metadata.



中文翻译:

一种从数据湖的源中提取主题指导视图的方法

在过去的几年中,数据湖正在成为从大量高度异构且快速变化的数据源中提取信息和知识的有效和高效的支持。数据湖管理需要定义新技术,这与过去用于数据仓库的技术大不相同。在这种情况下,要解决的最具挑战性的问题之一是从数据湖的(非常异构且通常是非结构化的)源中提取主题指导(即主题)视图。在本文中,我们提出了一个新的基于网络的模型,以统一表示数据湖的结构化,半结构化和非结构化源。然后,我们提出一种至少部分“构造”非结构化数据的新方法。最后,

更新日期:2020-05-24
down
wechat
bug