当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On data lake architectures and metadata management
Journal of Intelligent Information Systems ( IF 2.3 ) Pub Date : 2020-06-26 , DOI: 10.1007/s10844-020-00608-7
Pegdwendé Sawadogo , Jérôme Darmont

Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge traditional data management and analysis systems. The concept of data lake was introduced to address them. A data lake is a large, raw data repository that stores and manages all company data bearing any format. However, the data lake concept remains ambiguous or fuzzy for many researchers and practitioners, who often confuse it with the Hadoop technology. Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design. We particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes. We also discuss the pros and cons of data lakes and their design alternatives.

中文翻译:

关于数据湖架构和元数据管理

在过去的 20 年里,我们目睹了世界上数据生产的指数级增长。所谓的大数据一般来自交易系统,更重要的是来自物联网和社交媒体。它们的主要特点是数量、速度、多样性和真实性问题。与大数据相关的问题对传统的数据管理和分析系统提出了强烈挑战。引入了数据湖的概念来解决这些问题。数据湖是一个大型的原始数据存储库,用于存储和管理具有任何格式的所有公司数据。然而,对于许多研究人员和从业者来说,数据湖的概念仍然含糊不清或模糊,他们经常将其与 Hadoop 技术混淆。因此,我们在本文中提供了数据湖设计不同方法的综合最新技术。我们特别关注数据湖架构和元数据管理,这是成功数据湖的关键问题。我们还讨论了数据湖及其设计替代方案的优缺点。
更新日期:2020-06-26
down
wechat
bug