当前位置: X-MOL 学术Pers. Ubiquitous Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures
Personal and Ubiquitous Computing Pub Date : 2020-03-24 , DOI: 10.1007/s00779-020-01383-6
Bing Wei , Limin Xiao , Wei Wei , Yao Song , Baicheng Yan , Zhisheng Huo

How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.



中文翻译:

一种具有异构存储架构的高带宽和低成本数据处理方法

如何以低成本高效处理大数据是一个巨大的挑战。在商业、科学研究和公共管理等领域已经提出了许多高效和经济的数据处理方法。不幸的是,地震数据处理在石油勘探领域还没有达到同样的水平。虽然网络附加存储(NAS)和存储区域网络(SAN)等许多存储架构已被广泛用于处理海量地震数据,但这些架构在带宽和容量方面都很昂贵。在本文中,我们提出了一种高带宽和低成本的方法来填补这一空白。NASStore 是我们建立在 NAS 上的数据存储,用于处理地震数据。然而,在数据密集型计算场景中,带宽需求量大,需要存储的数据量大,无法提供低成本的高带宽。分布式文件系统,例如 Hadoop 分布式文件系统 (HDFS),提供了另一种存储数据的方法。它为用户应用程序提供高综合性能,同时在廉价的商品硬件上运行。为了克服NASStore的缺点,我们首先提出了HDFSStore,它是建立在HDFS之上的,用于处理地震数据。然后,我们将 NASStore 和 HDFSStore 结合起来构建一个新的混合数据存储,称为 SeisStore,其中采用高效的并行写入、读取和更新机制来提高系统性能。实验结果表明,SeisStore 比 NASStore 降低了多达 23 的存储成本。

更新日期:2020-03-24
down
wechat
bug