Skip to main content
Log in

Better database cost/performance via batched I/O on programmable SSD

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Data should be placed at the most cost- and performance-effective tier in the storage hierarchy. While performance and cost decrease with distance from the CPU, the cost/performance trade-off depends on how efficiently data can be moved across tiers. Log structuring improves this cost/performance by writing batches of pages from main memory to secondary storage using a conventional block-at-a-time I/O interface. However, log structuring incurs overhead in the form of recovery and garbage collection. With computational Solid-State Drives, it is now possible to design a storage interface that minimizes this overhead. In this paper, we offload log structuring from the CPU to the SSD. We define a new batch I/O storage interface and we design a Flash Translation Layer that takes care of log structuring on the SSD side. This removes the CPU computational and I/O load associated with recovery and garbage collection. We compare the performance of the Bw-tree key-value store with its LLAMA host-based log structuring to the same key-value software stack executing on a computational SSD equipped with a batch I/O interface. Our experimental results show the benefits of eliminating redundancies, minimizing interactions across storage layers, and avoiding the CPU cost of providing log structuring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Recently, many robust open-source FTLs have been released. Pblk [2], for example, implements a full-fledged, host-based FTL exposing a traditional block I/O interface, and is released as part of the Linux Kernel 4.12. Intel released a user-space FTL in the context of SPDK [41]. Those FTLs, however, must remain generic. They are not meant to be modified to support application-specific code.

  2. The source code of OXBlock is available at https://github.com/DFC-OpenSource/ox-ctrl.

  3. Note that the maximum size of an IP datagram, a basic transfer unit associated with a packet-switched network is 65,532 bytes including a 20 bytes header followed by a data area.

  4. Due to the host-based checkpoint for persisting BwTree mapping table entries, the BwTree_Block required to write additional 3.5 MB data during the checkpoint.

  5. In some cases, where the entire dataset fits into memory, no performance drops were observed as the SSD usage during the experiment was not big enough to meet the given GC condition.

  6. Note that we found similar observations when running the read-heavy workload.

  7. Note that the GC process on the SSD controller moves both User- and Meta-type data (Sect. 8.3).

References

  1. Bae, D.-H., Jo, I., Choi, Y.A., Hwang, J.-Y., Cho, S., Lee, D.-G., Jeong, J.: 2B-SSD: the case for dual, byte-and block-addressable solid-state drives. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 425–438. IEEE (2018)

  2. Bjørling, M., González, J., Bonnet, P.: LightNVM: the linux open-channel SSD subsystem. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 359–374 (2017)

  3. Bonnet, P.: What’s up with the storage hierarchy? In: CIDR (2017)

  4. Chung, T.-S., Park, D.-J., Park, S., Lee, D.-H., Lee, S.-W., Song, H.-J.: A survey of flash translation layer. J. Syst. Archit. 55(5–6), 332–343 (2009)

    Article  Google Scholar 

  5. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)

  6. Cornwell, M.: Anatomy of a solid-state drive. Commun. ACM 55(12), 59–63 (2012)

    Article  Google Scholar 

  7. Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server’s memory-optimized OLTP engine. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1243–1254 (2013)

  8. Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: Query processing on smart SSDS: opportunities and challenges. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1221–1230. ACM (2013)

  9. Do, J., Lomet, D., Picoli, I.L.: Improving CPU I/O performance via SSD controller FTL support for batched writes. In: Proceedings of the 15th International Workshop on Data Management on New Hardware, pp. 1–8 (2019)

  10. Do, J., Sengupta, S., Swanson, S.: Programmable solid-state storage in future cloud datacenters. Commun. ACM 62(6), 54–62 (2019)

    Article  Google Scholar 

  11. Eideticom: https://www.eideticom.com/

  12. González, J., Bjørling, M.: Multi-tenant I/O isolation with open-channel SSDS. In: Nonvolatile Memory Workshop (NVMW) (2017)

  13. Gray, J.: Put everything in figure (disk) controller. NASD Workshop (1998)

  14. Gu, B., Yoon, A.S., Bae, D.-H., Jo, I., Lee, J., Yoon, J., Kang, J.-U., Kwon, M., Yoon, C., Cho, S. et al.: Biscuit: a framework for near-data processing of big data workloads. In: ACM SIGARCH Computer Architecture News, vol. 44, pp. 153–165. IEEE Press (2016)

  15. Guo, C., Wu, H., Deng, Z., Soni, G., Ye, J., Padhye, J., Lipshteyn, M.: RDMA over commodity ethernet at scale. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp. 202–215 (2016)

  16. Hao, M., Soundararajan, G., Kenchammana-Hosekote, D., Chien, A.A., Gunawi, H.S.: The tail at store: a revelation from millions of hours of disk and SSD deployments. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 263–276 (2016)

  17. Hu, X.-Y., Eleftheriou, E., Haas, R., Iliadis, I., Pletka, R.: Write amplification analysis in flash-based solid state drives. In: Proceedings of SYSTOR 2009: the Israeli Experimental Systems Conference, pp. 1–9 (2009)

  18. Huang, J., Badam, A., Caulfield, L., Nath, S., Sengupta, S., Sharma, B., Qureshi, M.K.: Flashblox: achieving both performance isolation and uniform lifetime for virtualized SSDS. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp. 375–390 (2017)

  19. Jin, Y., Tseng, H.-W., Papakonstantinou, Y., Swanson, S.: KAML: a flexible, high-performance key-value SSD. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–384. IEEE (2017)

  20. Jo, I., Bae, D.-H., Yoon, A.S., Kang, J.-U., Cho, S., Lee, D.D., Jeong, J.: YourSQL: a high-performance database system leveraging in-storage computing. Proc. VLDB Endow. 9(12), 924–935 (2016)

    Article  Google Scholar 

  21. Kim, J., Lee, D., Noh, S.H.: Towards SLO complying SSDS through OPS isolation. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 183–189 (2015)

  22. Lee, C., Sim, D., Hwang, J., Cho, S.: F2FS: a new file system for flash storage. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015)

  23. Leis, V., Haubenschild, M., Neumann, T.: Optimistic lock coupling: a scalable and efficient general-purpose synchronization method. IEEE Data Eng. Bull. 42(1), 73–84 (2019)

    Google Scholar 

  24. Levandoski, J.J., Lomet, D.B., Sengupta, S.: The BW-tree: a B-tree for new hardware platforms. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 302–313. IEEE (2013)

  25. Levandoski, J., Lomet, D., Zhao, K.K.: Deuteronomy: transaction support for cloud data (2011)

  26. Levandoski, J., Lomet, D., Sengupta, S.: LLAMA: a cache/storage subsystem for modern hardware. Proc. VLDB Endow. 6(10), 877–888 (2013)

    Article  Google Scholar 

  27. Lomet, D.: Cost/performance in modern data stores: how data caching systems succeed. In: Proceedings of the 14th International Workshop on Data Management on New Hardware, pp. 1–10 (2018)

  28. Lu, Y., Shu, J., Zheng, W.: Extending the lifetime of flash-based storage through reducing write amplification from file systems. In: Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13), pp. 257–270 (2013)

  29. Microsoft Azure Cosmos DB: https://azure.microsoft.com/en-us/services/cosmos-db/

  30. Microsoft Denali: https://azure.microsoft.com/en-us/blog/microsoft-creates-industry-standards-for-datacenter-hardware-storage-and-security/

  31. Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Datab. Syst. TODS 17(1), 94–162 (1992)

    Article  Google Scholar 

  32. NGD Systems: https://www.ngdsystems.com/

  33. NVMe Specifications: https://nvmexpress.org/resources/specifications/

  34. Park, K., Kee, Y.-S., Patel, J.M., Do, J., Park, C., Dewitt, D.J.: Query processing on smart ssds. IEEE Data Eng. Bull. 37(2), 19–26 (2014)

    Google Scholar 

  35. Picoli, I.L., Hedam, N., Tözün, P., Bonnet, P.: Open-channel SSD (what is it good for). In: CIDR (2020)

  36. Rosenblum, M., Ousterhout, J.K.: The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. TOCS 10(1), 26–52 (1992)

    Article  Google Scholar 

  37. Samsung SmartSSD: https://samsungsemiconductor-us.com/smartssd/index.html/

  38. ScaleFlux: https://scaleflux.com/

  39. Seshadri, S., Gahagan, M., Bhaskaran, S., Bunker, T., De, A., Jin, Y., Liu, Y., Swanson, S.: Willow: a user-programmable SSD. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 67–80 (2014)

  40. SNIA Computational Storage: https://www.snia.org/computational/

  41. SPDK FTL: https://spdk.io/doc/ftl.html/

  42. Wang, P., Sun, G., Jiang, S., Ouyang, J., Lin, S., Zhang, C., Cong, J.: An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In: Proceedings of the Ninth European Conference on Computer Systems, p. 16. ACM (2014)

  43. Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 323–338 (2016)

  44. Zhang, J., Lu, Y., Shu, J., Qin, X.: FlashKV: accelerating KV performance with open-channel ssds. ACM Trans. Embed. Comput. Syst. TECS 16(5s), 139 (2017)

    Google Scholar 

  45. Zhu, F.: Toward the large deployment of open channel SSD. Flash Memory Summit (2019)

  46. Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., Liron, Y., Padhye, J., Raindel, S., Yahia, M.H., Zhang, M.: Congestion control for large-scale RDMA deployments. ACM SIGCOMM Comput. Commun. Rev. 45(4), 523–536 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaeyoung Do.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Do, J., Picoli, I.L., Lomet, D. et al. Better database cost/performance via batched I/O on programmable SSD. The VLDB Journal 30, 403–424 (2021). https://doi.org/10.1007/s00778-020-00648-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00648-z

Keywords

Navigation