Skip to main content
Log in

A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Erevelles S, Fukawa N, Swayne L (2016) Big Data consumer analytics and the transformation of marketing. J Business Res 69(2):897–904

    Article  Google Scholar 

  2. Wamba SF, Gunasekaran A, Akter S (2017) Big data analytics and firm performance: effects of dynamic capabilities. J Bus Res 70:356–365

    Article  Google Scholar 

  3. Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13

    Article  Google Scholar 

  4. Xun LF, Song HB (2016) Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput 11:78–89

    Google Scholar 

  5. Gibson GA, Van MR (2000) Network attached storage architecture. Commun ACM 43:37–45

    Article  Google Scholar 

  6. Xuan P, Ligon WB, Srimani PK (2017) Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput 61:18–34

    Article  MathSciNet  Google Scholar 

  7. Chang F, Dean J, Ghemawat S, et al. (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:4

    Article  Google Scholar 

  8. Apache Developers HBase: a distributed and scalable data store Available from: http://hbase.apache.org/. Last access Jul 9,(2019)

  9. DeCandia G, Hastorun D, Jampani M, et al. (2007) Dynamo: amazon’s highly available key-value store. Proc ACM SIGOPS 205–220

  10. Lakshman A, Malik P. (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS 44:35–40

    Google Scholar 

  11. Karger D, Lehman E, Leighton T, et al. (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. Proc ACM STOC 654–663

  12. Konstantin S, Hairong K, Sanjay R, et al. (2010) The hadoop distributed file system. Proc IEEE MSST 1–10

  13. Yang C, Huang Q, Li Z, et al. (2017) Big Data and cloud computing: innovation opportunities and challenges. Int J Digital Earth 10(1):13–53

    Article  Google Scholar 

  14. Tsai CF, Lin WC, Ke SW (2016) Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J Syst Softw 122:83–92

    Article  Google Scholar 

  15. Li Y, Gai K, Qiu L, et al. (2017) Intelligent cryptography approach for secure distributed big data storage in cloud computing. Inf Sci 387:103–115

    Article  MATH  Google Scholar 

  16. Borkar V, Carey M, Grover R, et al. (2011) Hyracks: a flexible and extensible foundation for data-intensive computing. IEEE 27th Int ConfData Eng 2011:1151–1162

    Google Scholar 

  17. Guo Q, Guo X, Bai Y, et al. (2011) A resistive TCAM accelerator for data-intensive computing. Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 2011:339–350

  18. Wei B, Xiao LM, Song Y, et al. (2018) A new adaptive coding selection method for distributed storage systems. IEEE Access 6:13350–13357

    Article  Google Scholar 

  19. Huang C, Huseyin S, Xu Y, et al. (2012) Erasure coding in windows azure storage. Proc USENIX ATC 15–26

  20. Xia M, Saxena M, Blaum M, et al. (2015) A tale of two erasure codes in HDFS. Proc USENIX FAST 213–226

  21. Khan O, Burns RC, Plank JS, et al. (2012) Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads. Proc USENIX FAST 20–20

  22. Zhang Z, Xiao LM, Wei B, et al. (2017) HSAS Tore: a hierarchical storage architecture for computing systems containing large-scale intermediate data. International Conference on Collaborative Computing: Networking Applications and Worksharing 591–601

  23. Islam NS, Lu X, Wasi-ur-Rahman M, et al. (2015) Triple-h: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. Proc IEEE CCGrid 101–110

  24. Ghemawat S, Gobioff H, Lung ST (2003) The Google file system. Proc ACM SOSP 29–43

  25. Baker J, Bond C, Corbett JC, et al. (2011) Megastore: providing scalable, highly available storage for interactive services. Proc. CIDR’11 223–234

  26. Bronson N, Amsden Z, Cabrera G, et al. (2013) TAO: Facebook’s distributed data store for the social graph. Proc USENIX ATC 46–60

  27. Corbett JC, Dean J, Epstein M, et al. (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst 31:1–8

    Article  Google Scholar 

  28. Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16:133–169

    Article  MATH  Google Scholar 

  29. LiveJournal Developers Memcached: a high-performance and distributed memory object caching system. Available from: http://memcached.org/ . Last access Jul 9, (2019)

Download references

Funding

This work was supported by the National key R&D Program of China under Grant NO. 2018YFB0203901, the National Natural Science Foundation of China under Grant No. 61772053, the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-10, Science Challenge Project, No. TZ2016002, and the China Postdoctoral Science Foundation under Grant No. 2018M641154.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Limin Xiao or Wei Wei.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, B., Xiao, L., Wei, W. et al. A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures. Pers Ubiquit Comput 27, 159–176 (2023). https://doi.org/10.1007/s00779-020-01383-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-020-01383-6

Keywords

Navigation