当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Architecture of a distributed storage that combines file system, memory and computation in a single layer
The VLDB Journal ( IF 4.2 ) Pub Date : 2020-02-26 , DOI: 10.1007/s00778-020-00605-w
Jia Zou , Arun Iyengar , Chris Jermaine

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non-shared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper, we propose a single system called Pangea that can manage all data—both intermediate and long-lived data, and their buffer/caching, page replacement, data placement optimization, and failure recovery—all in one monolithic distributed storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

中文翻译:

在单个层中结合了文件系统,内存和计算的分布式存储的体系结构

用于现代数据分析的存储和内存系统是高度分层的,可以在单独的系统中管理共享的持久性数据,缓存的数据和非共享的执行数据,例如分布式文件系统(如HDFS),内存中文件系统(如Alluxio)和计算像Spark这样的框架。这种分层会带来巨大的性能和管理成本。在本文中,我们提出了一个名为Pangea的单一系统,该系统可以在一个整体式分布式存储系统中管理所有数据(中间数据和长期数据,以及它们的缓冲区/缓存,页面替换,数据放置优化和故障恢复),没有任何分层。我们对Pangea进行了详细的性能评估,并表明其性能可与几种广泛使用的分层系统(例如Spark)相比。
更新日期:2020-02-26
down
wechat
bug