Skip to main content
Log in

Popularity-based full replica caching for erasure-coded distributed storage systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In most storage systems, the storage nodes store data on a local filesystem. Thus, unless they have a dedicated caching layer, they benefit from the usual filesystem cache in the host’s free memory. However, in erasure-coded storage systems, caching is effective only if all the systematic fragments corresponding to an object are in the cache. In this work, we propose a new caching policy adapting traditional methods to erasure-coded storage systems. The main idea of our solution is to cache a full object rather than fragments object. A simulation-based evaluation showed that our full replica solution is able to improve the cache hit ratio and reduce the cache waste ratio compared to the traditional caching method. Moreover, experimental evaluation has been conducted. It indicates that our implementation not only validates the previous results but also shows that cache hits on full replicas have a better request response time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)

    Article  Google Scholar 

  2. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)

  3. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)

    Article  Google Scholar 

  4. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: BigTable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)

    Article  Google Scholar 

  5. Corbett, J.C.: Spanner: Google’s globally-distributed database

  6. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. In: Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, pp. 205–220. ACM, New York (2007)

  7. Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 307–320. USENIX Association (2006)

  8. Ruty, G., Surcouf, A., Rougier, J.L.: Collapsing the layers: 6Stor, a scalable and IPv6-centric distributed storage system. In: 2017 Fourth International Conference on Software Defined Systems (SDS), pp. 81–86 (2017)

  9. Talaat, F.M., Ali, S.H., Saleh, A.I., Ali, H.A.: Effective cache replacement strategy (ECRS) for real-time fog computing environment. Clust. Comput. 23, 1–25 (2020)

    Article  Google Scholar 

  10. Shahid, M.H., Hameed, A.R., ul Islam, S., Khattak, H.A., Ud Din, I., Rodrigues, J.J.P.C.: Energy and delay efficient fog computing using caching mechanism. Comput. Commun. 154, 534–541 (2020)

    Article  Google Scholar 

  11. Kalghoum, A., Saidane, L.A.: FCR-NS: a novel caching and forwarding strategy for named data networking based on software defined networking. Clust. Comput. 22(3), 981–994 (2019)

    Article  Google Scholar 

  12. Hou, R., Zhang, L., Wu, T., Mao, T., Luo, J.: Bloom-filter-based request node collaboration caching for named data networking. Clust. Comput. 22(3), 6681–6692 (2019)

    Article  Google Scholar 

  13. Bok, K., Oh, H., Lim, J., Pae, Y., Choi, H., Lee, B., Yoo, J.: An efficient distributed caching for accessing small files in HDFS. Clust. Comput. 20(4), 3579–3592 (2017)

    Article  Google Scholar 

  14. Yu, Y., Wang, W., Huang, R., Zhang, J., Letaief, K.B.: Achieving load-balanced, redundancy-free cluster caching with selective partition. IEEE Trans. Parallel Distrib. Syst. 31(2), 439–454 (2019)

    Article  Google Scholar 

  15. Herodotou, H.: AutoCache: employing machine learning to automate caching in distributed file systems. In: 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), pp. 133–139. IEEE (2019)

  16. Xiang, Y., Lan, T., Aggarwal, V., Chen, Y.-F.R.: Joint latency and cost optimization for erasure-coded data center storage. IEEE/ACM Trans. Netw. 24(4), 2443–2457 (2016)

    Article  Google Scholar 

  17. Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 32(5), 989–997 (2014)

    Article  Google Scholar 

  18. Nadgowda, S.J., Sreenivas, R.C., Gupta, S., Gupta, N., Verma, A.: C2P: co-operative caching in distributed storage systems. In: International Conference on Service-Oriented Computing, pp. 214–229. Springer (2014)

  19. Luo, T., Aggarwal, V., Peleato, B.: Coded caching with distributed storage. arXiv preprint (2016).arXiv:1611.06591

  20. Aggarwal, V., Chen, Y.-F.R., Lan, T., Xiang, Y.: Sprout: a functional caching approach to minimize service latency in erasure-coded storage. IEEE/ACM Trans. Netw. 25(6), 3683–3694 (2017)

    Article  Google Scholar 

  21. Rashmi, K.V., Kosaian, J., Chowdhury, M., Stoica, I., Ramchandran, K.: EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), 2–4 November 2016, Savannah, GA, USA, pp. 401–417 (2016)

  22. Al-Abbasi, A.O., Aggarwal, V.: TTLCache: taming latency in erasure-coded storage through TTL caching. IEEE Trans. Netw. Serv. Manag. 17(3), 1582–1596 (2020)

    Article  Google Scholar 

  23. Red Hat: GlusterFS: Red Hat Storage Software Appliance. Technical Report (2011)

  24. Breslau, L., Cao, P., Fan, L., Phillips, G., Shenker, S.: Web caching and Zipf-like distributions: evidence and implications. In: INFOCOM’99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 1, pp. 126–134. IEEE (1999)

  25. Huberman, B.A., Pirolli, P.L.T., Pitkow, J.E., Lukose, R.M.: Strong regularities in world wide web surfing. Science 280(5360), 95–97 (1998)

    Article  Google Scholar 

  26. Adamic, L.A., Huberman, B.A.: Zipf’s law and the internet. Glottometrics 3(1), 143–150 (2002)

    Google Scholar 

  27. Crovella, M.E., Taqqu, M.S., Bestavros, A.: Heavy-tailed probability distributions in the world wide web. In: A Practical Guide to Heavy Tails, vol. 1, pp. 3–26. Birkh\(\ddot{\text{a}}\)user Basel, Boston (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hana Baccouch.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruty, G., Baccouch, H., Nguyen, V. et al. Popularity-based full replica caching for erasure-coded distributed storage systems. Cluster Comput 24, 3173–3186 (2021). https://doi.org/10.1007/s10586-021-03317-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03317-0

Keywords

Navigation