High-level ETL for semantic data warehouses,Semantic Web

当前位置： X-MOL 学术 › Semant. Web › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-level ETL for semantic data warehouses
Semantic Web ( IF 3.0 ) Pub Date : 2021-05-03 , DOI: 10.3233/sw-210429
Rudra Pratap Deb Nath _{1,

2,

3} , Oscar Romero ₂ , Torben Bach Pedersen ₁ , Katja Hose ₁

Affiliation

Abstract

The popularity of the Semantic Web (SW) encourages organizations to organize and publish semantic data using the RDF model. This growth poses new requirements to Business Intelligence technologies to enable On-Line Analytical Processing (OLAP)-like analysis over semantic data. The incorporation of semantic data into a Data Warehouse (DW) is not supported by the traditional Extract-Transform-Load (ETL) tools because they do not consider semantic issues in the integration process. In this paper, we propose a layer-based integration process and a set of high-level RDF-based ETL constructs required to define, map, extract, process, transform, integrate, update, and load (multidimensional) semantic data. Different to other ETL tools, we automate the ETL data flows by creating metadata at the schema level. Therefore, it relieves ETL developers from the burden of manual mapping at the ETL operation level. We create a prototype, named Semantic ETL Construct (SETL_CONSTRUCT), based on the innovative ETL constructs proposed here. To evaluate SETL_CONSTRUCT, we create a multidimensional semantic DW by integrating a Danish Business dataset and an EU Subsidy dataset using it and compare it with the previous programmable framework SETL_PROG in terms of productivity, development time, and performance. The evaluation shows that 1) SETL_CONSTRUCT uses 92% fewer Number of Typed Characters (NOTC) than SETL_PROG, and SETL_AUTO (the extension of SETL_CONSTRUCT for generating ETL execution flows automatically) further reduces the Number of Used Concepts (NOUC) by another 25%; 2) using SETL_CONSTRUCT, the development time is almost cut in half compared to SETL_PROG, and is cut by another 27% using SETL_AUTO; and 3) SETL_CONSTRUCT is scalable and has similar performance compared to SETL_PROG. We also evaluate our approach qualitatively by interviewing two ETL experts.

中文翻译：

语义数据仓库的高级ETL

摘要

语义网（SW）的流行鼓励组织使用RDF模型来组织和发布语义数据。这种增长对商业智能技术提出了新要求，以实现对语义数据的类似在线分析处理（OLAP）的分析。传统的Extract-Transform-Load（ETL）工具不支持将语义数据合并到数据仓库（DW）中，因为它们在集成过程中未考虑语义问题。在本文中，我们提出了一个基于层的集成过程，以及一组用于定义，映射，提取，处理，转换，集成，更新和加载（多维）语义数据所需的基于RDF的高级ETL构造。与其他ETL工具不同，我们通过在架构级别创建元数据来自动化ETL数据流。所以，它减轻了ETL开发人员在ETL操作级别进行手动映射的负担。我们创建了一个原型，名为Semantic ETL Construct（SETL _CONSTRUCT），基于此处提出的创新ETL结构。为了评估SETL _CONSTRUCT，我们通过集成丹麦商业数据集和使用它的欧盟补贴数据集来创建多维语义DW，并将其与以前的可编程框架SETL _PROG进行生产率，开发时间和性能方面的比较。评估显示：1）SETL _CONSTRUCT使用的键入字符数（NOTC）比SETL _PROG和SETL _AUTO（SETL _CONSTRUCT的扩展名）少92％（用于自动生成ETL执行流）进一步将已用概念数（NOUC）减少了25％；2）使用SETL _CONSTRUCT，与SETL _PROG相比，显影时间几乎缩短了一半，而使用SETL _AUTO时，显影时间又缩短了27％；和3）SETL _CONSTRUCT具有可扩展性，并且与SETL _PROG相比具有相似的性能。我们还通过采访两名ETL专家来定性评估我们的方法。

更新日期：2021-05-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11