Abstract
In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have conflicting requirements. Further, new hardware technologies and system designs create a hierarchical composition that may be ideal for computational storage operations. In this article, we investigate how to support a wide variety of conflicting I/O workloads under a single storage system. We introduce the idea of a Label, a new data representation, and, we present LABIOS: a new, distributed, Label- based I/O system. LABIOS boosts I/O performance by up to 17× via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in situ analytics and software defined storage support via data provisioning. LABIOS demonstrates the effectiveness of storage bridging to support the convergence of HPC and BigData workloads on a single platform.
- Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, Thomas R. W. Scogland, et al. 2020. Flux: Overcoming scheduling challenges for exascale workflows. Future Gen. Comput. Syst. 110 (2020), 202--213.Google ScholarCross Ref
- Amazon Inc. 2018. Amazon S3. Retrieved from http://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html.Google Scholar
- Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE, 1--11.Google ScholarDigital Library
- Andreas Berl, Erol Gelenbe, Marco Di Girolamo, Giovanni Giuliani, Hermann De Meer, Minh Quan Dang, and Kostas Pentikousis. 2010. Energy-efficient cloud computing. Comput. J. 53, 7 (2010), 1045--1051.Google ScholarDigital Library
- Dimitris Bertsimas and Ramazan Demir. 2002. An approximate DP approach to multidimensional knapsack problems. Manage. Sci. 48, 4 (2002), 550--565.Google ScholarDigital Library
- Deepavali M. Bhagwat, Marc Eshel, Dean Hildebrand, Manoj P. Naik, Wayne A. Sawdon, Frank B. Schmuck, and Renu Tewari. 2018. Global namespace for a hierarchical set of file systems. U.S. Patent App. 15/397,632.Google Scholar
- Deepavali M. Bhagwat, Marc Eshel, Dean Hildebrand, Manoj P. Naik, Wayne A. Sawdon, Frank B. Schmuck, and Renu Tewari. 2018. Rebuilding the namespace in a hierarchical union mounted file system. U.S. Patent App. 15/397,601.Google Scholar
- Wahid Bhimji, Debbie Bard, Melissa Romanus, David Paul, Andrey Ovsyannikov, Brian Friesen, Matt Bryson, Joaquin Correa, Glenn K. Lockwood, Vakho Tsulaia, et al. 2016. Accelerating Science with the NERSC Burst Buffer Early User Program. Technical Report. NERSC.Google Scholar
- John Biddiscombe, Jerome Soumagne, Guillaume Oger, David Guibert, and Jean-Guillaume Piccinali. 2011. Parallel computational steering and analysis for hpc applications using a paraview interface and the hdf5 dsm virtual file driver. In Proceedings of the Eurographics Symposium on Parallel Graphics and Visualization. Eurographics Association, 91--100.Google Scholar
- M. K. A. B. V. Bittorf, Taras Bobrovytsky, C. C. A. C. J. Erickson, Martin Grund Daniel Hecht, M. J. I. J. L. Kuff, Dileep Kumar Alex Leblang, N. L. I. P. H. Robinson, David Rorke Silvius Rus, John Russell Dimitris Tsirogiannis Skye Wanderman, and Milne Michael Yoder. 2015. Impala: A modern, open-source SQL engine for Hadoop. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research.Google Scholar
- M. Scot Breitenfeld, Neil Fortner, Jordan Henderson, Jerome Soumagne, Mohamad Chaarawi, Johann Lombardi, and Quincey Koziol. 2017. DAOS for extreme-scale systems in scientific applications. arXiv (2017): arXiv-1712.Google Scholar
- George H. Bryan and J. Michael Fritsch. 2002. A benchmark simulation for moist nonhydrostatic numerical models. Monthly Weather Rev. 130, 12 (2002), 2917--2928.Google ScholarCross Ref
- Philip Carns, Sam Lang, Robert Ross, Murali Vilayannur, Julian Kunkel, and Thomas Ludwig. 2009. Small-file access in parallel file systems. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’09). IEEE, 1--11.Google ScholarDigital Library
- Chameleon.org. 2018. Chameleon system. Retrieved from https://www.chameleoncloud.org/about/chameleon/.Google Scholar
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2 (2008), 4.Google ScholarDigital Library
- Nathanaël Cheriere, Matthieu Dorier, and Gabriel Antoniu. 2018. A Lower Bound for the Commission Times in Replication-based Distributed Storage Systems. Ph.D. Dissertation. Inria Rennes-Bretagne Atlantique.Google Scholar
- Cloud Native Computing Foundation. 2018. NATS Server-C Client. Retrieved from https://github.com/nats-io/cnats.Google Scholar
- Xiaoli Cui, Pingfei Zhu, Xin Yang, Keqiu Li, and Changqing Ji. 2014. Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 3 (2014), 1249--1259.Google ScholarDigital Library
- Matthew L. Curry, H. Lee Ward, and Geoff Danielson. 2015. Motivation and Design of the Sirocco Storage System Version 1.0. Technical Report. Sandia National Laboratories. Retrieved from https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2015/156031.pdf.Google Scholar
- Matthew Curtis-Maury, Vinay Devadas, Vania Fang, and Aditya Kulkarni. 2016. To waffinity and beyond: A scalable architecture for incremental parallelization of file system code. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 419--434.Google Scholar
- Matteo D’Ambrosio, Christian Dannewitz, Holger Karl, and Vinicio Vercellone. 2011. MDHT: A hierarchical name resolution service for information-centric networks. In Proceedings of the ACM Workshop on Information-centric Networking. ACM, 7--12.Google ScholarDigital Library
- Sudipto Das, Amr El Abbadi, and Divyakant Agrawal. 2009. ElasTraS: An elastic transactional data store in the cloud. HotCloud 9 (2009), 131--142.Google Scholar
- Hariharan Devarajan, Anthony Kougkas, X. H. Sun, and H. Chen. 2017. Open ethernet drive: Evolution of energy-efficient storage technology. Proc. ACM SIGHPC Datacloud 17 (2017).Google Scholar
- Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. Dataspaces: An interaction and coordination framework for coupled simulation workflows. Cluster Comput. 15, 2 (2012), 163--181.Google ScholarDigital Library
- Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A file format and I/O library for high performance computing applications. In Proceedings of Supercomputing, Vol. 99. 5--33.Google Scholar
- Kui Gao, Wei-keng Liao, Arifa Nisar, Alok Choudhary, Robert Ross, and Robert Latham. 2009. Using subfiling to improve programming flexibility and performance of parallel shared-file I/O. In Proceedings of the International Conference on Parallel Processing (ICPP’09). IEEE, 470--477.Google ScholarDigital Library
- Alan Gates. 2012. HCatalog: An Integration Tool. Technical Report. Intel.Google Scholar
- Roxana Geambasu, Amit A. Levy, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M. Levy. 2010. Comet: An active distributed key-value store. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’10). 323--336.Google Scholar
- Joachim Giesen, Eva Schuberth, and Miloš Stojaković. 2009. Approximate sorting. Fundamenta Informaticae 90, 1--2 (2009), 67--72.Google ScholarDigital Library
- Google Inc.2018. CityHash library. Retrieved from https://github.com/google/cityhash.Google Scholar
- Grant, W. Shane and Voorhies, Randolph. 2017. Cereal - A C++11 library for serialization by University of Southern California. Retrieved from http://uscilab.github.io/cereal/.Google Scholar
- Jan Heichler. 2014. An Introduction to BeeGFS. Technical Report.Google Scholar
- Tony Hey, Stewart Tansley, Kristin M. Tolle, et al. 2009. The Fourth Paradigm: Data-intensive Scientific Discovery. Vol. 1. Microsoft Research, Redmond, WA.Google Scholar
- IBM. 2018. HDFS Transparency. Retrieved from https://ibm.co/2Pciyv7.Google Scholar
- Intel. 2018. Hadoop Adapter for Lustre (HAL). Retrieved from https://github.com/whamcloud/lustre-connector-for-hadoop.Google Scholar
- High Performance Data Division Intel Enterprise Edition for Lustre* Software. 2014. WHITE PAPER Big Data Meets High Performance Computing. Technical Report. Intel. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/lustre-big-data-white-paper.pdf.Google Scholar
- Kamil Iskra, John W. Romein, Kazutomo Yoshii, and Pete Beckman. 2008. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 153--162.Google ScholarDigital Library
- Laxmikant V. Kale and Sanjeev Krishnan. 1996. Charm++: Parallel programming with message-driven objects. Parallel Programming Using C+ (1996), 175--213.Google Scholar
- Youngjae Kim, Raghul Gunasekaran, Galen M. Shipman, David Dillow, Zhe Zhang, and Bradley W. Settlemyer. 2010. Workload characterization of a leadership class storage cluster. In Proceedings of the 5th Petascale Data Storage Workshop (PDSW’10). IEEE, 1--5.Google Scholar
- Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 219--230.Google Scholar
- Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018. IRIS: I/O Redirection via Integrated Storage. In Proceedings of the 32nd ACM International Conference on Supercomputing (ICS’18). ACM.Google ScholarDigital Library
- Anthony Kougkas, Hariharan Devarajan, Xian-He Sun, and Jay Lofstead. 2018. Harmonia: An interference-aware dynamic I/O scheduler for shared non-volatile burst buffers. In Proceedings of the IEEE Cluster Conference (Cluster’18). IEEE.Google ScholarCross Ref
- Anthony Kougkas, Hassan Eslami, Xian-He Sun, Rajeev Thakur, and William Gropp. 2017. Rethinking key--value store for parallel I/O optimization. Int. J. High Perform. Comput. Appl. 31, 4 (2017), 335--356.Google ScholarDigital Library
- Anthony Kougkas, Anthony Fleck, and Xian-He Sun. 2016. Towards energy efficient data management in hpc: The open ethernet drive approach. In Proceedings of the 1st Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’16). IEEE, 43--48.Google ScholarCross Ref
- Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--15.Google ScholarDigital Library
- Jing Li, Jian Jia Chen, Kunal Agrawal, Chenyang Lu, Chris Gill, and Abusayeed Saifullah. 2014. Analysis of federated and global scheduling for parallel real-time tasks. In Proceedings of the 26th Euromicro Conference on Real-Time Systems (ECRTS’14). IEEE, 85--96.Google ScholarDigital Library
- Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Robert Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. 2003. Parallel netCDF: A high-performance scientific I/O interface. In Proceedings of the ACM/IEEE Supercomputing Conference. ACM/IEEE, 39--39.Google ScholarDigital Library
- Kenli Li, Xiaoyong Tang, Bharadwaj Veeravalli, and Keqin Li. 2015. Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems. IEEE Trans. Comput. 64, 1 (2015), 191--204.Google ScholarCross Ref
- Harold C. Lim, Shivnath Babu, and Jeffrey S. Chase. 2010. Automated control for elastic storage. In Proceedings of the 7th International Conference on Autonomic Computing. ACM, 1--10.Google Scholar
- Juan Liu, Yuyi Mao, Jun Zhang, and Khaled B. Letaief. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems. In Proceedings of the IEEE International Symposium on Information Theory (ISIT’16). IEEE, 1451--1455.Google Scholar
- Yu-Hang Liu and Xian-He Sun. 2015. LPM: Concurrency-driven layered performance matching. In Proceedings of the 44th International Conference on Parallel Processing (ICPP’15). IEEE, 879--888.Google ScholarDigital Library
- Glenn K. Lockwood, Damian Hazen, Quincey Koziol, R. S. Canon, Katie Antypas, Jan Balewski, Nicholas Balthaser, Wahid Bhimji, James Botts, Jeff Broughton, et al. 2017. Storage 2020: A Vision for the Future of HPC Storage. Technical Report. NERSC.Google Scholar
- Yucheng Low, Joseph E. Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E. Guestrin, and Joseph Hellerstein. 2010. Graphlab: A new framework for parallel machine learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. 340--349.Google Scholar
- Memached. 2018. Extstore plugin. Retrieved from https://github.com/memcached/memcached/wiki/Extstore.Google Scholar
- Monty Taylor. 2018. OpenStack Object Storage (Swift). Retrieved from https://launchpad.net/swift.Google Scholar
- Wira D. Mulia, Naresh Sehgal, Sohum Sohoni, John M. Acken, C. Lucas Stanberry, and David J. Fritz. 2013. Cloud workload characterization. IETE Tech. Rev. 30, 5 (2013), 382--397.Google ScholarCross Ref
- Ron A. Oldfield, Kenneth Moreland, Nathan Fabian, and David Rogers. 2014. Evaluation of methods to integrate analysis into a large-scale shock physics code. In Proceedings of the 28th ACM International Conference on Supercomputing. 83--92. DOI:https://doi.org/10.1145/2597652.2597668Google ScholarDigital Library
- Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: A not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, 1099--1110.Google ScholarDigital Library
- Fengfeng Pan, Yinliang Yue, Jin Xiong, and Daxiang Hao. 2014. I/O characterization of big data workloads in data centers. In Proceedings of the Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Springer, 85--97.Google ScholarCross Ref
- Juan Piernas, Jarek Nieplocha, and Evan J. Felix. 2007. Evaluation of active storage strategies for the lustre parallel file system. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, 28.Google Scholar
- Jakob Puchinger, Günther R. Raidl, and Ulrich Pferschy. 2010. The multidimensional knapsack problem: Structure and algorithms. INFORMS J. Comput. 22, 2 (2010), 250--265.Google ScholarDigital Library
- Ioan Raicu, Ian Foster, Mike Wilde, Zhao Zhang, Kamil Iskra, Peter Beckman, Yong Zhao, Alex Szalay, Alok Choudhary, Philip Little, et al. 2010. Middleware support for many-task computing. Cluster Comput. 13, 3 (2010), 291--314.Google ScholarDigital Library
- Daniel A. Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68.Google ScholarDigital Library
- Kai Ren, Qing Zheng, Swapnil Patil, and Garth Gibson. 2014. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE, 237--248.Google ScholarDigital Library
- Erik Riedel, Christos Faloutsos, Garth A. Gibson, and David Nagle. 2001. Active disks for large-scale data processing. Computer 34, 6 (2001), 68--74.Google ScholarDigital Library
- Erik Riedel, Garth Gibson, and Christos Faloutsos. 1998. Active storage for large-scale data mining and multimedia applications. In Proceedings of 24th Conference on Very Large Databases. Citeseer, 62--73.Google ScholarDigital Library
- Robert B. Ross, Rajeev Thakur, et al. 2000. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference.Google Scholar
- Michael W. Shapiro. 2017. Method and system for global namespace with consistent hashing. U.S. Patent 9,787,773.Google Scholar
- Steve Conway. 2015. When Data Needs More Firepower: The HPC, Analytics Convergence. Retrieved from https://bit.ly/2od68r7.Google Scholar
- Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation (Frontiers’99). IEEE, 182--189.Google ScholarDigital Library
- Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629.Google ScholarDigital Library
- Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active flash: Toward energy-efficient, in situ data analytics on extreme-scale machines. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13). 119--132.Google Scholar
- Murali Vilayannur, Partho Nath, and Anand Sivasubramaniam. 2005. Providing tunable consistency for a parallel file store. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’05), Vol. 5. 2--2.Google Scholar
- Zhenyu Wang and David Garlan. 2000. Task-driven Computing. Technical Report. School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA.Google Scholar
- Hakim Weatherspoon and John D. Kubiatowicz. 2002. Erasure coding vs. replication: A quantitative comparison. In Proceedings of the International Workshop on Peer-to-Peer Systems. Springer, 328--337.Google Scholar
- Jean-Francois Weets, Manish Kumar Kakhani, and Anil Kumar. 2015. Limitations and challenges of HDFS and MapReduce. In Proceedings of the International Conference on Green Computing and Internet of Things (ICGCIoT’15). IEEE, 545--549.Google ScholarDigital Library
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association, 307--320.Google ScholarDigital Library
- Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16). 323--338.Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2--2.Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10-10 (2010), 95.Google ScholarDigital Library
- Shuanglong Zhang, Helen Catanese, and An-I. Andy Wang. 2016. The composite-file file system: Decoupling the one-to-one mapping of files and metadata for better performance. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16). 15--22.Google Scholar
- Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. 2010. PreDatA—Preparatory data analytics on peta-scale machines. In Proceedings of the IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’10). IEEE, 1--12.Google ScholarCross Ref
- Qing Zheng, Kai Ren, and Garth Gibson. 2014. BatchFS: Scaling the file system control plane with client-funded metadata servers. In Proceedings of the 9th Parallel Data Storage Workshop. IEEE, 1--6.Google ScholarDigital Library
- Shujia Zhou, Bruce H. Van Aartsen, and Thomas L. Clune. 2008. A lightweight scalable I/O utility for optimizing high-end computing applications. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS’08). IEEE, 1--7.Google Scholar
Index Terms
Bridging Storage Semantics Using Data Labels and Asynchronous I/O
Recommendations
LABIOS: A Distributed Label-Based I/O System
HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed ComputingIn the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have ...
Bridging data-capacity gap in big data storage
AbstractBig data is aggressive in its production, and with the merger of Cloud computing and IoT, the huge volumes of data generated are increasingly challenging the storage capacity of data centres. This has led to a growing data-capacity gap ...
Highlights- Introduces the working principles of three emerging storage technologies, i.e., Optical storage, DNA storage and Holographic storage.
Agility and Performance in Elastic Distributed Storage
Special Issue on Usenix Fast 2014Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can ...
Comments