Abstract
How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.
Similar content being viewed by others
References
Erevelles S, Fukawa N, Swayne L (2016) Big Data consumer analytics and the transformation of marketing. J Business Res 69(2):897–904
Wamba SF, Gunasekaran A, Akter S (2017) Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res 70:356–365
Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13
Xun LF, Song HB (2016) Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput 11:78–89
Gibson GA, Van MR (2000) Network attached storage architecture. Commun ACM 43:37–45
Xuan P, Ligon WB, Srimani PK (2017) Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput 61:18–34
Chang F, Dean J, Ghemawat S, et al. (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:4
Apache Developers HBase: a distributed and scalable data store Available from: http://hbase.apache.org/. Last access Jul 9,(2019)
DeCandia G, Hastorun D, Jampani M, et al. (2007) Dynamo: amazon’s highly available key-value store. Proc ACM SIGOPS 205–220
Lakshman A, Malik P. (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS 44:35–40
Karger D, Lehman E, Leighton T, et al. (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. Proc ACM STOC 654–663
Konstantin S, Hairong K, Sanjay R, et al. (2010) The hadoop distributed file system. Proc IEEE MSST 1–10
Yang C, Huang Q, Li Z, et al. (2017) Big Data and cloud computing: innovation opportunities and challenges. Int J Digital Earth 10(1):13–53
Tsai CF, Lin WC, Ke SW (2016) Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J Syst Softw 122:83–92
Li Y, Gai K, Qiu L, et al. (2017) Intelligent cryptography approach for secure distributed big data storage in cloud computing. Inf Sci 387:103–115
Borkar V, Carey M, Grover R, et al. (2011) Hyracks: a flexible and extensible foundation for data-intensive computing. IEEE 27th Int ConfData Eng 2011:1151–1162
Guo Q, Guo X, Bai Y, et al. (2011) A resistive TCAM accelerator for data-intensive computing. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 2011:339–350
Wei B, Xiao LM, Song Y, et al. (2018) A new adaptive coding selection method for distributed storage systems. IEEE Access 6:13350–13357
Huang C, Huseyin S, Xu Y, et al. (2012) Erasure coding in windows azure storage. Proc USENIX ATC 15–26
Xia M, Saxena M, Blaum M, et al. (2015) A tale of two erasure codes in HDFS. Proc USENIX FAST 213–226
Khan O, Burns RC, Plank JS, et al. (2012) Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. Proc USENIX FAST 20–20
Zhang Z, Xiao LM, Wei B, et al. (2017) HSAS Tore: a hierarchical storage architecture for computing systems containing large-scale intermediate data. International Conference on Collaborative Computing: Networking Applications and Worksharing 591–601
Islam NS, Lu X, Wasi-ur-Rahman M, et al. (2015) Triple-h: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. Proc IEEE CCGrid 101–110
Ghemawat S, Gobioff H, Lung ST (2003) The Google file system. Proc ACM SOSP 29–43
Baker J, Bond C, Corbett JC, et al. (2011) Megastore: providing scalable, highly available storage for interactive services. Proc. CIDR’11 223–234
Bronson N, Amsden Z, Cabrera G, et al. (2013) TAO: Facebook’s distributed data store for the social graph. Proc USENIX ATC 46–60
Corbett JC, Dean J, Epstein M, et al. (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst 31:1–8
Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16:133–169
LiveJournal Developers Memcached: a high-performance and distributed memory object caching system. Available from: http://memcached.org/ . Last access Jul 9, (2019)
Funding
This work was supported by the National key R&D Program of China under Grant NO. 2018YFB0203901, the National Natural Science Foundation of China under Grant No. 61772053, the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-10, Science Challenge Project, No. TZ2016002, and the China Postdoctoral Science Foundation under Grant No. 2018M641154.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, B., Xiao, L., Wei, W. et al. A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures. Pers Ubiquit Comput 27, 159–176 (2023). https://doi.org/10.1007/s00779-020-01383-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-020-01383-6