High-Level ETL for Semantic Data Warehouses---Full Version,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-Level ETL for Semantic Data Warehouses---Full Version
arXiv - CS - Databases Pub Date : 2020-06-12 , DOI: arxiv-2006.07180
Rudra Pratap Deb Nath, Oscar Romero, Torben Bach Pedersen, and Katja Hose

The popularity of the Semantic Web (SW) encourages organizations to organize and publish semantic data using the RDF model. This growth poses new requirements to Business Intelligence (BI) technologies to enable On-Line Analytical Processing (OLAP)-like analysis over semantic data. The incorporation of semantic data into a Data Warehouse (DW) is not supported by the traditional Extract-Transform-Load (ETL) tools because they do not consider semantic issues in the integration process. In this paper, we propose a layer-based integration process and a set of high-level RDF-based ETL constructs required to define, map, extract, process, transform, integrate, update, and load (multidimensional) semantic data. Different to other ETL tools, we automate the ETL data flows by creating metadata at the schema level. Therefore, it relieves ETL developers from the burden of manual mapping at the ETL operation level. We create a prototype, named Semantic ETL Construct (SETLCONSTRUCT), based on the innovative ETL constructs proposed here. To evaluate SETLCONSTRUCT, we create a multidimensional semantic DW by integrating a Danish Business dataset and an EU Subsidy dataset using it and compare it with the previous programmable framework SETLPROG in terms of productivity, development time and performance. The evaluation shows that 1) SETLCONSTRUCT uses 92% fewer Number of Typed Characters (NOTC) than SETLPROG, and SETLAUTO (the extension of SETLCONSTRUCT for generating ETL execution flow automatically) further reduces the Number of Used Concepts (NOUC) by another 25%; 2) using SETLCONSTRUCT, the development time is almost cut in half compared to SETLPROG, and is cut by another 27% using SETLAUTO; 3) SETLCONSTRUCT is scalable and has similar performance compared to SETLPROG.

中文翻译：

语义数据仓库高级ETL---完整版

语义网 (SW) 的流行鼓励组织使用 RDF 模型组织和发布语义数据。这种增长对商业智能 (BI) 技术提出了新的要求，以实现对语义数据的类似在线分析处理 (OLAP) 的分析。传统的提取-转换-加载 (ETL) 工具不支持将语义数据合并到数据仓库 (DW) 中，因为它们在集成过程中不考虑语义问题。在本文中，我们提出了一个基于层的集成过程和一组基于 RDF 的高级 ETL 构造，用于定义、映射、提取、处理、转换、集成、更新和加载（多维）语义数据。与其他 ETL 工具不同，我们通过在模式级别创建元数据来自动化 ETL 数据流。所以，它减轻了 ETL 开发人员在 ETL 操作级别手动映射的负担。我们基于此处提出的创新 ETL 构造创建了一个名为 Semantic ETL Construct (SETLCONSTRUCT) 的原型。为了评估 SETLCONSTRUCT，我们通过集成丹麦商业数据集和使用它的欧盟补贴数据集来创建多维语义 DW，并将其与之前的可编程框架 SETLPROG 在生产力、开发时间和性能方面进行比较。评估表明：1）SETLCONSTRUCT使用的类型字符数（NOTC）比SETLPROG少92%，SETLAUTO（SETLCONSTRUCT自动生成ETL执行流程的扩展）进一步减少了25%的已用概念数（NOUC）；2）使用SETLCONSTRUCT，与SETLPROG相比，开发时间几乎减少了一半，并使用 SETLAUTO 再减少 27%；3) SETLCONSTRUCT 是可扩展的，并且与 SETLPROG 相比具有相似的性能。

更新日期：2020-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>