Abstract
The long-awaited nonvolatile random-access memory technology NVRAM is finally publicly available on the market and requires significant changes to the architecture of in-memory database systems. Since such hybrid DRAM–NVRAM database systems may be able to keep the primary data solely persistent in the NVRAM, efficient replication mechanisms need to be considered to prevent base data losses and to guarantee high availability in case of various persistent memory failures. In this article, we argue for a software-based replication approach and present compute node-local mechanisms to provide the building blocks—generally available for most platforms—for an efficient NVRAM replication with a low latency and minimal throughput penalty. Within our evaluation, based on both real NVRAM hardware and DRAM-backed emulation, we measured up to 10\(\times \) less overhead for our optimized replication mechanisms compared to the basic replication mechanism of the Intel persistent memory development kit PMDK. Finally, we present a lightweight switching approach for enabling the adaptive online selection of the best replication mechanism for a given situation.
Similar content being viewed by others
Notes
_mm512_mask_prefetch_i64scatter_pd intrinsic.
_mm512_mask_i64scatter_epi32 or _mm512_mask_i64scatter_epi64 intrinsic.
References
Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Farsite, Wattenhofer, R.: Federated, available, and reliable storage for an incompletely trusted environment. In OSDI (2002)
Renen, Alexander van, Leis, Viktor,.: A. K. T. N. T. H. K. O. Y. D. L. H. S. M. Managing Non-Volatile Memory in Database Systems. In: Proceedings of the 2018 SIGMOD International Conference on Management of Data, June, 2018, Houston pp. 691–706 (2018)
Alsberg, P.A., Day, J.D.: A principle for resilient sharing of distributed resources. In: Proceedings of the 2Nd International Conference on Software Engineering (Los Alamitos, CA, USA, 1976), ICSE ’76, IEEE Computer Society Press, pp. 562–570
Andrei, M., Lemke, C., Radestock, G., Schulze, R., Thiel, C., Blanco, R., Meghlan, A., Sharique, M., Seifert, S., Vishnoi, S., Booss, D., Peh, T., Schreter, I., Thesing, W., Wagle, M., Willhalm, T.: Sap hana adoption of non-volatile memory. Proc. VLDB Endow. 10(12), 1754–1765 (2017)
Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. PVLDB 10(4), 337–348 (2016)
Bhandari, K., Chakrabarti, D.R., Makalu, Boehm, H.: Fast recoverable allocation of non-volatile memory. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30–November 4, 2016, pp. 677–694 (2016)
Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., Mckelvie, S., Xu, Y., Srivastav, S., Wu, J., Simitci, H., Haridas, J., Uddaraju, C., Khatri, H., Mcnett, M., Sankaran, S., Manivannan, K., Rigas, L.: Windows azure storage: a highly available cloud storage service with strong consistency. In: In SOSP ‘11, pp. 143–157 (2011)
Chen, S., Gibbons, P.B., Nath, S.: Rethinking database algorithms for phase change memory. In: CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings, pp. 21–31 (2011)
Chen, S., Jin, Q.: Persistent B+-trees in non-volatile main memory. PVLDB 8(7), 786–797 (2015)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Dhamane, R., Patiño-Martínez, M., Vianello, V., Jiménez-Peris, R.: Performance evaluation of database replication systems. In: 18th International Database Engineering & Applications Symposium, IDEAS 2014, Porto, Portugal, July 7–9, 2014 , pp. 288–293 (2014)
Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: Proceedings of the Ninth European Conference on Computer Systems (New York, NY, USA, 2014), EuroSys ’14, ACM, pp. 15:1–15:15
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2003), SOSP ’03, ACM, pp. 29–43
Guo, J., Zhang, C., Cai, P., Zhou, M., Zhou, A.: Low Overhead Log Replication for Main Memory Database System. In: Web-Age Information Management—17th International Conference, WAIM: Nanchang, China, June 3–5, 2016. Proceedings, Part II 2016, 159–170 (2016)
Intel. An introduction to the intel quickpath interconnect
Intel. Intel instruction reference manual (vol 2a, 3–147)
Kapela, T. An introduction to replication. http://pmem.io/2015/11/23/replication-intro.html
Kiefer, T., Kissinger, T., Schlegel, B., Habich, D., Molka, D., Lehner, W.: ERIS live: a numa-aware in-memory storage engine for tera-scale multiprocessor systems. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014 , pp. 689–692 (2014)
Kim, J., Salem, K., Daudjee, K., Aboulnaga, A., Pan, X.: Database high availability using shadow systems. In: Proceedings of the Sixth ACM Symposium on Cloud Computing (New York, NY, USA, 2015), SoCC ’15, ACM, pp. 209–221
Kim, W.-H., Seo, J., Kim, J., Nam, B.: clfb-tree: Cacheline friendly persistent b-tree for nvram. ACM Trans. Storage 14(1), 5:1–5:17 (2018)
FOEDUS, Kimura, H.: OLTP Engine for a thousand cores and NVRAM. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31–June 4, 2015, pp. 691–706 (2015)
Kolditz, T., Habich, D., Lehner, W., Werner, M., de Bruijn, S.T.: Ahead: adaptable data hardening for on-the-fly hardware error detection during database query processing. In: Proceedings of the 2018 International Conference on Management of Data (New York, NY, USA, 2018), SIGMOD ’18, ACM, pp. 1619–1634
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: Oceanstore: an architecture for global-scale persistent storage. SIGPLAN Not. 35(11), 190–201 (2000)
Lamport, L. Paxos made simple
Luo, Y., Govindan, S., Sharma, B., Santaniello, M., Meza, J., Kansal, A., Liu, J., Khessib, B., Vaid, K., Mutlu, O.: Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks , pp. 467–478 (2014)
Meena, J., Min Sze, S., Chand, U., Tseng, T.-Y.: Overview of emerging non-volatile memory technologies. Nanoscale Res. Lett. 9(09), 1–33 (2014)
Meza, J., Wu, Q., Kumar, S., Mutlu, O.: Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field. In: Proceedings of the 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (Washington, DC, USA, 2015), DSN ’15, IEEE Computer Society, pp. 415–426
Moraru, I., Andersen, D.G., Kaminsky, M., Tolia, N., Ranganathan, P., Binkert, N.: Consistent, durable, and safe memory management for byte-addressable non volatile main memory. In: Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (New York, NY, USA, 2013), TRIOS ’13, ACM, pp. 1:1–1:17
Oukid, I., Booss, D., Lehner, W., Bumbulis, P., Willhalm, T.: SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery. In: Tenth International Workshop on Data Management on New Hardware, DaMoN 2014, Snowbird, UT, USA, June 23, 2014, pp. 8:1–8:7 (2014)
Oukid, I., Booss, D., Lespinasse, A., Lehner, W., Willhalm, T., Gomes, G.: Memory management techniques for large-scale persistent-main-memory systems. PVLDB 10(11), 1166–1177 (2017)
Oukid, I., Lasperas, J., Nica, A., Willhalm, T., Lehner, W.: Fptree: A hybrid scm-dram persistent and concurrent b-tree for storage class memory. In: Proceedings of the 2016 International Conference on Management of Data (New York, NY, USA, 2016), SIGMOD ’16, ACM, pp. 371–386
Oukid, I., Lehner, W.: Data structure engineering for byte-addressable non-volatile memory. In: Proceedings of the 2017 ACM International Conference on Management of Data (New York, NY, USA, 2017), SIGMOD ’17, ACM, pp. 1759–1764
Oukid, I., Lehner, W.: Towards a single-level database architecture on non-volatile memory. In: 8th Annual Non-Volatile Memories Workshop (NVMW) (2017)
Oukid, I., Lehner, W., Kissinger, T., Willhalm, T., Bumbulis, P.: Instant recovery for main memory databases. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4–7, Online Proceedings (2015)
Pacheco, P.S.: Chapter 2—parallel hardware and parallel software. In: Pacheco, P.S. (ed.) An Introduction to Parallel Programming, pp. 15–81. Morgan Kaufmann, Boston (2011)
Pham, C.M., Dogaru, V., Wagle, R., Venkatramani, C., Kalbarczyk, Z., Iyer, R.: An evaluation of zookeeper for high availability in system s. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (New York, NY, USA, 2014), ICPE ’14, ACM, pp. 209–217
Rudoff, A.: Persistent memory programming. Login Usenix Mag. 42, 34–40 (2015)
Sartakov, V.A., Kapitza, R.: Multi-site synchronous VM replication for persistent systems with asymmetric read/write latencies. In: 22nd IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2017, Christchurch, New Zealand, January 22–25, pp. 195–204 (2017)
Schwalb, D., Berning, T., Faust, M., Dreseler, M., Plattner, H.: nvm malloc: Memory allocation for nvram. In: ADMS@VLDB (2015)
Sridharan, V., DeBardeleben, N., Blanchard, S., Ferreira, K.B., Stearley, J., Shalf, J., Gurumurthi, S.: Memory errors in modern systems: The good, the bad, and the ugly. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2015), ASPLOS ’15, ACM, pp. 297–310
Venkataraman, S., Tolia, N., Ranganathan, P., Campbell, R.H.: Consistent and durable data structures for non-volatile byte-addressable memory. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies (Berkeley, CA, USA, 2011), FAST’11, USENIX Association, pp. 5–5
Viglas, S.: Write-limited sorts and joins for persistent memory. PVLDB 7(5), 413–424 (2014)
Viswanathan, V.: Intel Memory Latency Checker. https://software.intel.com/en-us/articles/intelr-memory-latency-checker
Wang, T., Johnson, R., Pandis, I.: Query fresh: log shipping on steroids. Proc. VLDB Endow. 11(4), 406–419 (2017)
Yang, J., Wei, Q., Wang, C., Chen, C., Yong, K.L., He, B.: NV-tree: a consistent and workload-adaptive tree structure for non-volatile memory. IEEE Trans. Comput. 65(7), 2169–2183 (2016)
Yu, S., Xiao, N., Deng, M., Xing, Y., Liu, F., Cai, Z., Chen, W.: WAlloc: An efficient wear-aware allocator for non-volatile main memory. In: 34th IEEE International Performance Computing and Communications Conference, IPCCC 2015, Nanjing, China, December 14–16, pp. 1–8 (2015)
Zhang, Y., Yang, J., Memaripour, A., Swanson, S.: Mojim: a reliable and highly-available non-volatile memory system. SIGARCH Comput. Archit. News 43(1), 3–18 (2015)
Zhong, M., Shen, K., Seiferas, J.: Replication degree customization for high availability. SIGOPS Oper. Syst. Rev. 42(4), 55–68 (2008)
Acknowledgements
The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improving the final version of the article. This work is partly funded (i) by the German Research Foundation (DFG) in the context of the project “Self-Recoverable and Highly Available Data Structures for NVRAM-centric Database Systems” (LE-1416/27-1), (ii) by DFG-CRC 912 (HAEC) and (iii) by the Cluster of Excellence “Center for Advancing Electronics Dresden” (Orchestration Path).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zarubin, M., Kissinger, T., Habich, D. et al. Efficient compute node-local replication mechanisms for NVRAM-centric data structures. The VLDB Journal 29, 775–795 (2020). https://doi.org/10.1007/s00778-019-00549-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00549-w