Skip to main content

Advertisement

Log in

Finding the optimal execution scheme of external mergesort on solid state drives

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As the flash-based solid-state drives(SSDs) gradually replace the mechanical hard disk drives(HDDs) as the mainstream storage, unlike the HDDs, SSDs have rich internal parallelism, which makes it have the excellent characteristics that HDDs do not have. External mergesort, as the classical algorithm of external sorting adopted in many systems and algorithms, has an important impact on the overall performance. Therefore, it is of great significance to optimize and improve the efficiency of external mergesort algorithm. The research work on optimizing raw external mergesort algorithm on SSDs is relatively few. Thus, aiming at the external mergesort problem, based on the characteristics of SSDs, this paper proposes the SortDecision algorithm which can calculate its optimal execution scheme, including merging way, read buffer size, and write buffer size which determine the execution process of external mergesort. Exploiting the above optimal execution scheme, external mergesort can obtain better efficiency. In the SortDecision algorithm, external mergesort problem on SSDs is formalized and transformed into a piecewise convex optimization problem. Then, the optimal external mergesort scheme is obtained by enumerating the solutions of each subconvex problem. The experimental results show that the external mergesort proceeds guided by SortDecision algorithm can achieve a speedup of 1\(\sim \)6.7 compared to the traditional external mergesort algorithm in the case of limited memory provided. The richer the internal parallelism resources inside SSDs, the better the effect of SortDecision’s acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Mysql. [Online]. Avaliable: https://www.mysql.com/ (Accessed on: Jun. 2020)

  2. Postgresql. [Online]. Avaliable: https://www.postgresql.org/ (Accessed on: Jun. 2020)

  3. Andreou, P., Spanos, O., Zeinalipour-Yazti, D., Samaras, G., Chrysanthis, P.: Fsort: External sorting on flash-based sensor devices. In: ACM International Conference Proceeding Series, pp. 1–6 (2009)

  4. Chen, F., Hou, B., Lee, R.: Internal parallelism of flash memory-based solid-state drives. ACM Trans. Storage (TOS) 12(3), 1–39 (2016)

    Article  Google Scholar 

  5. Chen, F., Lee, R., Zhang, X.: Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In: 2011 IEEE 17Th International Symposium on High Performance Computer Architecture, pp. 266–277. IEEE (2011)

  6. Chen Yubiao Li Jianzhong, L.Y.L.F.G.H.: R-tree optimization method using internal parallelism of flash memory-based solid-state drives. Journal of Computer Research and Development 55(9), 2066 (2018)

    Google Scholar 

  7. Cossentine, T., Lawrence, R.: Efficient external sorting on flash memory embedded devices. International Journal of Database Management Systems 5(1), 1 (2013)

    Article  Google Scholar 

  8. Council, T.P.P.: Tpc-h benchmark specification. [Online]. Avaliable: http://www.tpc.org/tpch/ (Accessed on: Jun. 2019)

  9. Graefe, G.: Implementing sorting in database systems. ACM Computing Surveys (CSUR) 38(3), 10–es (2006)

    Article  Google Scholar 

  10. Groppe, S., Groppe, J.: External sorting for index construction of large semantic web databases. In: S.Y. Shin, S. Ossowski, M. Schumacher, M.J. Palakal, C.C. Hung (eds.) Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, pp. 1373–1380. ACM (2010)

  11. Hu, Y., Jiang, H., Feng, D., Tian, L., Luo, H., Ren, C.: Exploring and exploiting the multilevel parallelism inside ssds for improved performance and endurance. IEEE Trans. Comput. 62(6), 1141–1155 (2012)

    Article  MathSciNet  Google Scholar 

  12. Jackson, R., Lawrence, R.: Faster sorting for flash memory embedded devices. In: 2019 IEEE Canadian Conference of Electrical and Computer Engineering, pp. 1–5. IEEE (2019)

  13. Laga, A., Boukhobza, J., Singhoff, F., Koskas, M.: Montres: merge on-the-run external sorting algorithm for large data volumes on ssd based storage systems. IEEE Trans. Comput. 66(10), 1689–1702 (2017)

    Article  MathSciNet  Google Scholar 

  14. Lee, J., Roh, H., Park, S.: External mergesort for flash-based solid state drives. IEEE Trans. Comput. 65(5), 1518–1527 (2015)

    Article  MathSciNet  Google Scholar 

  15. Li, H., Hao, M., Tong, M.H., Sundararaman, S., Bjørling, M., Gunawi, H.S.: The case of femu: Cheap, accurate, scalable and extensible flash emulator. In: Proceedings of 16th USENIX Conference on File and Storage Technologies (FAST). Oakland, CA (2018)

  16. Liu, Y., He, Z., Chen, Y.P.P., Nguyen, T.: External sorting on flash memory via natural page run generation. Comput. J. 54(11), 1882–1990 (2011)

    Article  Google Scholar 

  17. Park, H., Shim, K.: Fast: Flash-aware external sorting for mobile database systems. J. Syst. Softw. 82(8), 1298–1312 (2009)

    Article  Google Scholar 

  18. Park, S., Seo, E., Shin, J., Maeng, S., Lee, J.: Exploiting internal parallelism of flash-based ssds. IEEE Comput. Archit. Lett. 9(1), 9–12 (2010)

    Article  Google Scholar 

  19. Roh, H., Park, S., Kim, S., Shin, M., Lee, S.: B+-tree index optimization by exploiting internal parallelism of flash-based solid state drives. Proc. VLDB Endowment 5(4), 286–297 (2011)

    Article  Google Scholar 

  20. Wang, P., Sun, G., Jiang, S., Ouyang, J., Lin, S., Zhang, C., Cong, J.: An efficient design and implementation of lsm-tree based key-value store on open-channel ssd. In: Proceedings of the Ninth European Conference on Computer Systems, pp. 1–14 (2014)

  21. Wen-Yu, F.Y.L.L., Xiao-Feng, M.: Database table scan and aggregation by exploiting internal parallelism of ssds. Chinese J. Comput. 35(11), 2327–2336 (2012)

  22. Wu, C.H., Huang, K.Y.: Data sorting in flash memory. ACM Trans. Storage (TOS) 11(2), 1–25 (2015)

    Article  Google Scholar 

  23. Yang, J., Fung, G.P.C., Lu, W., Zhou, X., Chen, H., Du, X.: Finding superior skyline points for multidimensional recommendation applications. World Wide Web 15(1), 33–60 (2012)

    Article  Google Scholar 

  24. Zhang, J., Shu, J., Lu, Y.: Parafs: a Log-Structured File System to Exploit the Internal Parallelism of Flash Devices. In: Proc. of the 14Th USENIX Conf on File and Storage Technologies, pp. 87–100 (2016)

  25. Zheng, D., Mhembere, D., Burns, R.C., Vogelstein, J.T., Priebe, C.E., Szalay, A.S.: Flashgraph: Processing billion-node graphs on an array of commodity ssds. In: Schindler J., Zadok E. (eds.) Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, February 16-19, USENIX Association (2015), pp 45–58. Santa Clara, CA, USA, (2015)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. U1811461, No. 61832003, and No. 61732003).

Funding

This work was supported by the National Natural Science Foundation of China (No. U1811461, No. 61832003, and No. 61732003).

Author information

Authors and Affiliations

Authors

Contributions

Yubiao Chen proposed the original idea, and was responsible for the experiments and paper writing. Jianzhong Li helped to organize the introduction writing and the experiment designing. Hong Gao helped to check the logic of the whole paper and gave suggestions for revision.

Corresponding author

Correspondence to Yubiao Chen.

Ethics declarations

Conflicts of interest/Competing interests

The authors declare that they have no conflict of interest.

  • The authors have no relevant financial or non-financial interests to disclose.

  • The authors have no conflicts of interest to declare that are relevant to the content of this article.

  • All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

  • The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Availability of data and material

TPC-H is a standard benchmark. It can be downloaded from offcial website [8].

Code Availability

Source code are avaliable at: https://github.com/cyb3727/SortDecision1.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Li, J. & Gao, H. Finding the optimal execution scheme of external mergesort on solid state drives. World Wide Web 24, 781–804 (2021). https://doi.org/10.1007/s11280-021-00872-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00872-9

Keywords

Navigation