A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures

Wei, Bing; Xiao, Limin; Wei, Wei; Song, Yao; Yan, Baicheng; Huo, Zhisheng

doi:10.1007/s00779-020-01383-6

A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures

Original Article
Published: 24 March 2020

Volume 27, pages 159–176, (2023)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

Bing Wei ORCID: orcid.org/0000-0001-7279-3220^1,2,
Limin Xiao^1,2,
Wei Wei³,
Yao Song^1,2,
Baicheng Yan^1,2 &
…
Zhisheng Huo^1,2

223 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study on Seismic Big Data Handling at Seismic Exploration Industry

ONFS: a hierarchical hybrid file system based on memory, SSD, and HDD for high performance computers

Article 01 December 2017

Xin Liu, Yu-tong Lu, … Ying Lu

Big data storage and management in SaaS applications

Article 22 September 2017

Xi Zheng, Min Fu & Mohit Chugh

References

Erevelles S, Fukawa N, Swayne L (2016) Big Data consumer analytics and the transformation of marketing. J Business Res 69(2):897–904
Article Google Scholar
Wamba SF, Gunasekaran A, Akter S (2017) Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res 70:356–365
Article Google Scholar
Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13
Article Google Scholar
Xun LF, Song HB (2016) Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput 11:78–89
Google Scholar
Gibson GA, Van MR (2000) Network attached storage architecture. Commun ACM 43:37–45
Article Google Scholar
Xuan P, Ligon WB, Srimani PK (2017) Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput 61:18–34
Article MathSciNet Google Scholar
Chang F, Dean J, Ghemawat S, et al. (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:4
Article Google Scholar
Apache Developers HBase: a distributed and scalable data store Available from: http://hbase.apache.org/. Last access Jul 9,(2019)
DeCandia G, Hastorun D, Jampani M, et al. (2007) Dynamo: amazon’s highly available key-value store. Proc ACM SIGOPS 205–220
Lakshman A, Malik P. (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS 44:35–40
Google Scholar
Karger D, Lehman E, Leighton T, et al. (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. Proc ACM STOC 654–663
Konstantin S, Hairong K, Sanjay R, et al. (2010) The hadoop distributed file system. Proc IEEE MSST 1–10
Yang C, Huang Q, Li Z, et al. (2017) Big Data and cloud computing: innovation opportunities and challenges. Int J Digital Earth 10(1):13–53
Article Google Scholar
Tsai CF, Lin WC, Ke SW (2016) Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J Syst Softw 122:83–92
Article Google Scholar
Li Y, Gai K, Qiu L, et al. (2017) Intelligent cryptography approach for secure distributed big data storage in cloud computing. Inf Sci 387:103–115
Article MATH Google Scholar
Borkar V, Carey M, Grover R, et al. (2011) Hyracks: a flexible and extensible foundation for data-intensive computing. IEEE 27th Int ConfData Eng 2011:1151–1162
Google Scholar
Guo Q, Guo X, Bai Y, et al. (2011) A resistive TCAM accelerator for data-intensive computing. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 2011:339–350
Wei B, Xiao LM, Song Y, et al. (2018) A new adaptive coding selection method for distributed storage systems. IEEE Access 6:13350–13357
Article Google Scholar
Huang C, Huseyin S, Xu Y, et al. (2012) Erasure coding in windows azure storage. Proc USENIX ATC 15–26
Xia M, Saxena M, Blaum M, et al. (2015) A tale of two erasure codes in HDFS. Proc USENIX FAST 213–226
Khan O, Burns RC, Plank JS, et al. (2012) Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. Proc USENIX FAST 20–20
Zhang Z, Xiao LM, Wei B, et al. (2017) HSAS Tore: a hierarchical storage architecture for computing systems containing large-scale intermediate data. International Conference on Collaborative Computing: Networking Applications and Worksharing 591–601
Islam NS, Lu X, Wasi-ur-Rahman M, et al. (2015) Triple-h: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. Proc IEEE CCGrid 101–110
Ghemawat S, Gobioff H, Lung ST (2003) The Google file system. Proc ACM SOSP 29–43
Baker J, Bond C, Corbett JC, et al. (2011) Megastore: providing scalable, highly available storage for interactive services. Proc. CIDR’11 223–234
Bronson N, Amsden Z, Cabrera G, et al. (2013) TAO: Facebook’s distributed data store for the social graph. Proc USENIX ATC 46–60
Corbett JC, Dean J, Epstein M, et al. (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst 31:1–8
Article Google Scholar
Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16:133–169
Article MATH Google Scholar
LiveJournal Developers Memcached: a high-performance and distributed memory object caching system. Available from: http://memcached.org/ . Last access Jul 9, (2019)

Download references

Funding

This work was supported by the National key R&D Program of China under Grant NO. 2018YFB0203901, the National Natural Science Foundation of China under Grant No. 61772053, the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-10, Science Challenge Project, No. TZ2016002, and the China Postdoctoral Science Foundation under Grant No. 2018M641154.

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
Bing Wei, Limin Xiao, Yao Song, Baicheng Yan & Zhisheng Huo
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Bing Wei, Limin Xiao, Yao Song, Baicheng Yan & Zhisheng Huo
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China
Wei Wei

Authors

Bing Wei
View author publications
You can also search for this author in PubMed Google Scholar
Limin Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yao Song
View author publications
You can also search for this author in PubMed Google Scholar
Baicheng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhisheng Huo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Limin Xiao or Wei Wei.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, B., Xiao, L., Wei, W. et al. A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures. Pers Ubiquit Comput 27, 159–176 (2023). https://doi.org/10.1007/s00779-020-01383-6

Download citation

Received: 10 September 2019
Accepted: 19 February 2020
Published: 24 March 2020
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00779-020-01383-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures

Abstract

Access this article

Similar content being viewed by others

A Study on Seismic Big Data Handling at Seismic Exploration Industry

ONFS: a hierarchical hybrid file system based on memory, SSD, and HDD for high performance computers

Big data storage and management in SaaS applications

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures

Abstract

Access this article

Similar content being viewed by others

A Study on Seismic Big Data Handling at Seismic Exploration Industry

ONFS: a hierarchical hybrid file system based on memory, SSD, and HDD for high performance computers

Big data storage and management in SaaS applications

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation