Skip to main content
Log in

Efficient compute node-local replication mechanisms for NVRAM-centric data structures

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The long-awaited nonvolatile random-access memory technology NVRAM is finally publicly available on the market and requires significant changes to the architecture of in-memory database systems. Since such hybrid DRAM–NVRAM database systems may be able to keep the primary data solely persistent in the NVRAM, efficient replication mechanisms need to be considered to prevent base data losses and to guarantee high availability in case of various persistent memory failures. In this article, we argue for a software-based replication approach and present compute node-local mechanisms to provide the building blocks—generally available for most platforms—for an efficient NVRAM replication with a low latency and minimal throughput penalty. Within our evaluation, based on both real NVRAM hardware and DRAM-backed emulation, we measured up to 10\(\times \) less overhead for our optimized replication mechanisms compared to the basic replication mechanism of the Intel persistent memory development kit PMDK. Finally, we present a lightweight switching approach for enabling the adaptive online selection of the best replication mechanism for a given situation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Notes

  1. https://github.com/pmem/pmdk.

  2. https://github.com/pmem/pmemkv.

  3. _mm512_mask_prefetch_i64scatter_pd intrinsic.

  4. _mm512_mask_i64scatter_epi32 or _mm512_mask_i64scatter_epi64 intrinsic.

  5. https://github.com/pmem/pmemkv/.

References

  1. Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Farsite, Wattenhofer, R.: Federated, available, and reliable storage for an incompletely trusted environment. In OSDI (2002)

  2. Renen, Alexander van, Leis, Viktor,.: A. K. T. N. T. H. K. O. Y. D. L. H. S. M. Managing Non-Volatile Memory in Database Systems. In: Proceedings of the 2018 SIGMOD International Conference on Management of Data, June, 2018, Houston pp. 691–706 (2018)

  3. Alsberg, P.A., Day, J.D.: A principle for resilient sharing of distributed resources. In: Proceedings of the 2Nd International Conference on Software Engineering (Los Alamitos, CA, USA, 1976), ICSE ’76, IEEE Computer Society Press, pp. 562–570

  4. Andrei, M., Lemke, C., Radestock, G., Schulze, R., Thiel, C., Blanco, R., Meghlan, A., Sharique, M., Seifert, S., Vishnoi, S., Booss, D., Peh, T., Schreter, I., Thesing, W., Wagle, M., Willhalm, T.: Sap hana adoption of non-volatile memory. Proc. VLDB Endow. 10(12), 1754–1765 (2017)

    Article  Google Scholar 

  5. Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. PVLDB 10(4), 337–348 (2016)

    Google Scholar 

  6. Bhandari, K., Chakrabarti, D.R., Makalu, Boehm, H.: Fast recoverable allocation of non-volatile memory. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30–November 4, 2016, pp. 677–694 (2016)

  7. Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., Mckelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., Haridas, J., Uddaraju, C., Khatri, H., Mcnett, M., Sankaran, S., Manivannan, K., Rigas, L.: Windows azure storage: a highly available cloud storage service with strong consistency. In: In SOSP ‘11, pp. 143–157 (2011)

  8. Chen, S., Gibbons, P.B., Nath, S.: Rethinking database algorithms for phase change memory. In: CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings, pp. 21–31 (2011)

  9. Chen, S., Jin, Q.: Persistent B+-trees in non-volatile main memory. PVLDB 8(7), 786–797 (2015)

    Google Scholar 

  10. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)

    Article  Google Scholar 

  11. Dhamane, R., Patiño-Martínez, M., Vianello, V., Jiménez-Peris, R.: Performance evaluation of database replication systems. In: 18th International Database Engineering & Applications Symposium, IDEAS 2014, Porto, Portugal, July 7–9, 2014 , pp. 288–293 (2014)

  12. Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: Proceedings of the Ninth European Conference on Computer Systems (New York, NY, USA, 2014), EuroSys ’14, ACM, pp. 15:1–15:15

  13. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2003), SOSP ’03, ACM, pp. 29–43

  14. Guo, J., Zhang, C., Cai, P., Zhou, M., Zhou, A.: Low Overhead Log Replication for Main Memory Database System. In: Web-Age Information Management—17th International Conference, WAIM: Nanchang, China, June 3–5, 2016. Proceedings, Part II 2016, 159–170 (2016)

  15. Intel. An introduction to the intel quickpath interconnect

  16. Intel. Intel instruction reference manual (vol 2a, 3–147)

  17. Kapela, T. An introduction to replication. http://pmem.io/2015/11/23/replication-intro.html

  18. Kiefer, T., Kissinger, T., Schlegel, B., Habich, D., Molka, D., Lehner, W.: ERIS live: a numa-aware in-memory storage engine for tera-scale multiprocessor systems. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014 , pp. 689–692 (2014)

  19. Kim, J., Salem, K., Daudjee, K., Aboulnaga, A., Pan, X.: Database high availability using shadow systems. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (New York, NY, USA, 2015), SoCC ’15, ACM, pp. 209–221

  20. Kim, W.-H., Seo, J., Kim, J., Nam, B.: clfb-tree: Cacheline friendly persistent b-tree for nvram. ACM Trans. Storage 14(1), 5:1–5:17 (2018)

    Article  Google Scholar 

  21. FOEDUS, Kimura, H.: OLTP Engine for a thousand cores and NVRAM. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31–June 4, 2015, pp. 691–706 (2015)

  22. Kolditz, T., Habich, D., Lehner, W., Werner, M., de Bruijn, S.T.: Ahead: adaptable data hardening for on-the-fly hardware error detection during database query processing. In: Proceedings of the 2018 International Conference on Management of Data (New York, NY, USA, 2018), SIGMOD ’18, ACM, pp. 1619–1634

  23. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: Oceanstore: an architecture for global-scale persistent storage. SIGPLAN Not. 35(11), 190–201 (2000)

    Article  Google Scholar 

  24. Lamport, L. Paxos made simple

  25. Luo, Y., Govindan, S., Sharma, B., Santaniello, M., Meza, J., Kansal, A., Liu, J., Khessib, B., Vaid, K., Mutlu, O.: Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks , pp. 467–478 (2014)

  26. Meena, J., Min Sze, S., Chand, U., Tseng, T.-Y.: Overview of emerging non-volatile memory technologies. Nanoscale Res. Lett. 9(09), 1–33 (2014)

    Google Scholar 

  27. Meza, J., Wu, Q., Kumar, S., Mutlu, O.: Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field. In: Proceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (Washington, DC, USA, 2015), DSN ’15, IEEE Computer Society, pp. 415–426

  28. Moraru, I., Andersen, D.G., Kaminsky, M., Tolia, N., Ranganathan, P., Binkert, N.: Consistent, durable, and safe memory management for byte-addressable non volatile main memory. In: Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (New York, NY, USA, 2013), TRIOS ’13, ACM, pp. 1:1–1:17

  29. Oukid, I., Booss, D., Lehner, W., Bumbulis, P., Willhalm, T.: SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery. In: Tenth International Workshop on Data Management on New Hardware, DaMoN 2014, Snowbird, UT, USA, June 23, 2014, pp. 8:1–8:7 (2014)

  30. Oukid, I., Booss, D., Lespinasse, A., Lehner, W., Willhalm, T., Gomes, G.: Memory management techniques for large-scale persistent-main-memory systems. PVLDB 10(11), 1166–1177 (2017)

    Google Scholar 

  31. Oukid, I., Lasperas, J., Nica, A., Willhalm, T., Lehner, W.: Fptree: A hybrid scm-dram persistent and concurrent b-tree for storage class memory. In: Proceedings of the 2016 International Conference on Management of Data (New York, NY, USA, 2016), SIGMOD ’16, ACM, pp. 371–386

  32. Oukid, I., Lehner, W.: Data structure engineering for byte-addressable non-volatile memory. In: Proceedings of the 2017 ACM International Conference on Management of Data (New York, NY, USA, 2017), SIGMOD ’17, ACM, pp. 1759–1764

  33. Oukid, I., Lehner, W.: Towards a single-level database architecture on non-volatile memory. In: 8th Annual Non-Volatile Memories Workshop (NVMW) (2017)

  34. Oukid, I., Lehner, W., Kissinger, T., Willhalm, T., Bumbulis, P.: Instant recovery for main memory databases. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4–7, Online Proceedings (2015)

  35. Pacheco, P.S.: Chapter 2—parallel hardware and parallel software. In: Pacheco, P.S. (ed.) An Introduction to Parallel Programming, pp. 15–81. Morgan Kaufmann, Boston (2011)

    Chapter  Google Scholar 

  36. Pham, C.M., Dogaru, V., Wagle, R., Venkatramani, C., Kalbarczyk, Z., Iyer, R.: An evaluation of zookeeper for high availability in system s. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (New York, NY, USA, 2014), ICPE ’14, ACM, pp. 209–217

  37. Rudoff, A.: Persistent memory programming. Login Usenix Mag. 42, 34–40 (2015)

    Google Scholar 

  38. Sartakov, V.A., Kapitza, R.: Multi-site synchronous VM replication for persistent systems with asymmetric read/write latencies. In: 22nd IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2017, Christchurch, New Zealand, January 22–25, pp. 195–204 (2017)

  39. Schwalb, D., Berning, T., Faust, M., Dreseler, M., Plattner, H.: nvm malloc: Memory allocation for nvram. In: ADMS@VLDB (2015)

  40. Sridharan, V., DeBardeleben, N., Blanchard, S., Ferreira, K.B., Stearley, J., Shalf, J., Gurumurthi, S.: Memory errors in modern systems: The good, the bad, and the ugly. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2015), ASPLOS ’15, ACM, pp. 297–310

  41. Venkataraman, S., Tolia, N., Ranganathan, P., Campbell, R.H.: Consistent and durable data structures for non-volatile byte-addressable memory. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies (Berkeley, CA, USA, 2011), FAST’11, USENIX Association, pp. 5–5

  42. Viglas, S.: Write-limited sorts and joins for persistent memory. PVLDB 7(5), 413–424 (2014)

    Google Scholar 

  43. Viswanathan, V.: Intel Memory Latency Checker. https://software.intel.com/en-us/articles/intelr-memory-latency-checker

  44. Wang, T., Johnson, R., Pandis, I.: Query fresh: log shipping on steroids. Proc. VLDB Endow. 11(4), 406–419 (2017)

    Article  Google Scholar 

  45. Yang, J., Wei, Q., Wang, C., Chen, C., Yong, K.L., He, B.: NV-tree: a consistent and workload-adaptive tree structure for non-volatile memory. IEEE Trans. Comput. 65(7), 2169–2183 (2016)

    Article  MathSciNet  Google Scholar 

  46. Yu, S., Xiao, N., Deng, M., Xing, Y., Liu, F., Cai, Z., Chen, W.: WAlloc: An efficient wear-aware allocator for non-volatile main memory. In: 34th IEEE International Performance Computing and Communications Conference, IPCCC 2015, Nanjing, China, December 14–16, pp. 1–8 (2015)

  47. Zhang, Y., Yang, J., Memaripour, A., Swanson, S.: Mojim: a reliable and highly-available non-volatile memory system. SIGARCH Comput. Archit. News 43(1), 3–18 (2015)

    Article  Google Scholar 

  48. Zhong, M., Shen, K., Seiferas, J.: Replication degree customization for high availability. SIGOPS Oper. Syst. Rev. 42(4), 55–68 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improving the final version of the article. This work is partly funded (i) by the German Research Foundation (DFG) in the context of the project “Self-Recoverable and Highly Available Data Structures for NVRAM-centric Database Systems” (LE-1416/27-1), (ii) by DFG-CRC 912 (HAEC) and (iii) by the Cluster of Excellence “Center for Advancing Electronics Dresden” (Orchestration Path).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikhail Zarubin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zarubin, M., Kissinger, T., Habich, D. et al. Efficient compute node-local replication mechanisms for NVRAM-centric data structures. The VLDB Journal 29, 775–795 (2020). https://doi.org/10.1007/s00778-019-00549-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00549-w

Keywords

Navigation