Skip to main content
Log in

Fine-grained management of I/O optimizations based on workload characteristics

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

With the advent of new computing paradigms, parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications, such as financial computing, business, and public administration. Parallel file systems provide storage services for multiple applications. As a result, various requirements need to be met. However, parallel file systems usually provide a unified storage solution, which cannot meet specific application needs. In this paper, an extended file handle scheme is proposed to deal with this problem. The original file handle is extended to record I/O optimization information, which allows file systems to specify optimizations for a file or directory based on workload characteristics. Therefore, fine-grained management of I/O optimizations can be achieved. On the basis of the extended file handle scheme, data prefetching and small file optimization mechanisms are proposed for parallel file systems. The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Carns P H, Ligon W B, Ross R B, Thakur R. PVFS: a parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference. 2000, 317–327

  2. Schmuck F, Haskin R. GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies. 2002, 231–244

  3. Wei B, Xiao L, Zhou B, Qin G, Yan B, Huo Z. I/O optimizations based on workload characteristics for parallel file systems. In: Proceedings of the 16th Annual IFIP International Conference on Network and Parallel Computing. 2019, 305–310

  4. Isaila F, Balaprakash P, Wild S M, Kimpe D, Latham R, Ross R, Hovland P. Collective I/O tuning using analytical and machine learning models. In: Proceedings of the IEEE International Conference on Cluster Computing. 2015, 128–137

  5. Byna S, Chen Y, Sun X H, Thakur R, Gropp W. Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. 2008, 1–12

  6. Chen J, Liu J, Roth P, Chen Y. Using working set reorganization to manage storage systems with hard and solid state disks. In: Proceedings of the 43rd International Conference on Parallel Processing Workshops. 2014, 283–291

  7. Costa L B, Ripeanu M. Towards automating the configuration of a distributed storage system. In: Proceedings of the 11th IEEE/ACM International Conference on Grid Computing. 2010, 201–208

  8. Narayan S, Chandy J. Attest: attributes-based extendable storage. Journal of Systems and Software, 2010, 83(4): 548–556

    Article  Google Scholar 

  9. Madhyastha T M, Reed D A. Learning to classify parallel input/output access patterns. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(8): 802–813

    Article  Google Scholar 

  10. Wang Y, Kaeli D. Profile-guided I/O partitioning. In: Proceedings of the 17th Annual International Conference on Supercomputing. 2003, 252–260

  11. Habermann P, Chi C C, Alvarez-Mesa M, Juurlink B. Application-specific cache and prefetching for HEVC CABAC decoding. IEEE MultiMedia, 2017, 24(1): 72–85

    Article  Google Scholar 

  12. Chen J, Roth P C, Chen Y. Using pattern-models to guide SSD deployment for big data applications in HPC systems. In: Proceedings of IEEE International Conference on Big Data. 2013, 332–337

  13. He J, Bent J, Torres A, Grider G, Gibson G, Maltzahn C, Sun X H. I/O acceleration with pattern detection. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. 2013, 25–36

  14. Patrick C M, Kandemir M, Karakoy M, Son S W, Choudhary A. Cashing in on hints for better prefetching and caching in PVFS and MPI-IO. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010, 191–202

  15. Battle L, Chang R, Stonebraker M. Dynamic prefetching of data tiles for interactive visualization. In: Proceedings of the 2016 International Conference on Management of Data. 2016, 1363–1375

  16. Al-Kiswany S, Gharaibeh A, Ripeanu M. The case for a versatile storage system. ACM SIGOPS Operating Systems Review, 2010, 44(1): 10–14

    Article  Google Scholar 

  17. Calderon A, Garcia-Carballeira F, Sanchez L M, Garcia J D, Fernandez J. Fault tolerant file models for parallel file systems: introducing distribution patterns for every file. The Journal of Supercomputing, 2009, 47(3): 312–334

    Article  Google Scholar 

  18. Qiu M, Sha E H M. Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Transactions on Design Automation of Electronic Systems, 2009, 14(2): 1–30

    Article  Google Scholar 

  19. Vilayannur M, Nath P, Sivasubramaniam A. Providing tunable consistency for a parallel file store. In: Proceedings of the 4th USENIX Conference on File and Storage Technologies. 2005, 17–30

  20. Xue J, Yan F, Birke R, Chen L Y, Scherer T, Smirni E. PRACTISE: robust prediction of data center time series. In: Proceedings of the 11th International Conference on Network and Service Management. 2015, 126–134

  21. Dai D, Bao F S, Zhou J, Chen Y. Block2vec: a deep learning strategy on mining block correlations in storage systems. In: Proceedings of the 45th International Conference on Parallel Processing Workshops. 2016, 230–239

  22. Guo C, Li Y, Liu H, Wu Z. An application-oriented cache allocation and prefetching method for long-running applications in distributed storage systems. Chinese Journal of Electronics, 2019, 28(4): 773–780

    Article  Google Scholar 

  23. Zhang S L, Catanese H, Wang A A I. The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies. 2016, 15–22

  24. Hou B, Chen F. Pacaca: mining object correlations and parallelism for enhancing user experience with cloud storage. In: Proceedings of the 26th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 2018, 293–305

  25. Sheoran S, Sethia D, Saran H. Optimized mapfile based storage of small files in hadoop. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 2017, 906–912

  26. Mehmood A, Usman M, Mehmood W, Khaliq Y. Performance efficiency in hadoop for storing and accessing small files. In: Proceedings of the 7th International Conference on Innovative Computing Technology. 2017, 211–216

  27. Carns P, Lang S, Ross R, Vilayannur M, Kunkel J, Ludwig T. Small-file access in parallel file systems. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 2009, 1–11

  28. Kuhn M, Kunkel J M, Ludwig T. Dynamic file system semantics to enable metadata optimizations in PVFS. Concurrency and Computation: Practice and Experience, 2009, 21(14): 1775–1788

    Article  Google Scholar 

  29. Wei B, Xiao L M, Wei W, Song Y, Zhou B Y. A new adaptive coding selection method for distributed storage systems. IEEE Access, 2018, 6(1): 13350–13357

    Article  Google Scholar 

  30. Li Z P, Yu H, Liu Y C, Liu F Q. An improved adaptive exponential smoothing model for short-term travel time forecasting of urban arterial street. Acta Automatica Sinica, 2008, 34(11): 1404–1409

    Article  Google Scholar 

  31. Weil S A, Brandt S A, Miller E L, Long D D E. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 2006, 307–320

  32. Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies. 2010, 1–10

  33. Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles. 2003, 29–43

Download references

Acknowledgements

This work was supported by the National key R&D Program of China (2018YFB0203901), the National Natural Science Foundation of China (Grant No. 61772053), the Science Challenge Project, No. TZ2016002, and the fund of the State Key Laboratory of Software Development Environment (SKLSDE-2017ZX-10).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Xiao.

Additional information

An earlier version of this paper entitled “I/O Optimizations Based on Workload Characteristics for Parallel File Systems” was presented at the International Conference on Network and Parallel Computing (NPC 2019)

Bing Wei received the BS in electrical engineering and MS degrees in computer science from Capital Normal University, China in 2012 and 2015, respectively, He is currently pursuing a PhD degree in computer science at Beihang University, China. His main research interests include distributed file systems, high performance computing, software engineering, and clusters.

Limin Xiao received the BS in computer science from Tsinghua University, China in 1993, the MS and PhD degree in computer science from Institute of Computing, Chinese Academy of Sciences, China in 1996 and 1998, respectively. He is a professor of the School of Computer Science and Engineering, Beihang University, China. He is a senior membership of China Computer Federation. His main research areas are computer architecture, computer system software, high performance computing, virtualization and cloud computing.

Bingyu Zhou received the Bachelor of computer science and technology from BeiJing Wuzi University, China in 2015. She received the MS degree in computer science in BeiHang University, China in 2019. Her main research interests include bandwidth allocation and optimization of file system.

Guangjun Qin received the MS degree in computer application technology in Zhengzhou University, China in 2006 and the PhD degree in computer architecture from Beihang University, China in 2015. From 2015 to 2017, he was a Postdoctoral Fellow at Beihang University, China. Since 2017, he has been a Lecturer of College Smart City, Beijing Union University, China. His main research areas are computer architecture, storage system, information security and big data analytics.

Baicheng Yan received his BS degree in computer science and technology from Harbin Engineering University, China in 2016. He is currently pursuing a PhD degree in computer science at Beihang University, China. His research interests include high performance computing, parallel and distributed computing.

Zhisheng Huo is a Post-doctoral fellow of School of Computer Science and Engineering, Beihang University, China. His research interests include high performance computing, big data storage, bandwidth allocation and optimization of file system, and distributed storage system.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, B., Xiao, L., Zhou, B. et al. Fine-grained management of I/O optimizations based on workload characteristics. Front. Comput. Sci. 15, 153102 (2021). https://doi.org/10.1007/s11704-020-9344-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-9344-1

Keywords

Navigation