Abstract
As the flash-based solid-state drives(SSDs) gradually replace the mechanical hard disk drives(HDDs) as the mainstream storage, unlike the HDDs, SSDs have rich internal parallelism, which makes it have the excellent characteristics that HDDs do not have. External mergesort, as the classical algorithm of external sorting adopted in many systems and algorithms, has an important impact on the overall performance. Therefore, it is of great significance to optimize and improve the efficiency of external mergesort algorithm. The research work on optimizing raw external mergesort algorithm on SSDs is relatively few. Thus, aiming at the external mergesort problem, based on the characteristics of SSDs, this paper proposes the SortDecision algorithm which can calculate its optimal execution scheme, including merging way, read buffer size, and write buffer size which determine the execution process of external mergesort. Exploiting the above optimal execution scheme, external mergesort can obtain better efficiency. In the SortDecision algorithm, external mergesort problem on SSDs is formalized and transformed into a piecewise convex optimization problem. Then, the optimal external mergesort scheme is obtained by enumerating the solutions of each subconvex problem. The experimental results show that the external mergesort proceeds guided by SortDecision algorithm can achieve a speedup of 1\(\sim \)6.7 compared to the traditional external mergesort algorithm in the case of limited memory provided. The richer the internal parallelism resources inside SSDs, the better the effect of SortDecision’s acceleration.
Similar content being viewed by others
References
Mysql. [Online]. Avaliable: https://www.mysql.com/ (Accessed on: Jun. 2020)
Postgresql. [Online]. Avaliable: https://www.postgresql.org/ (Accessed on: Jun. 2020)
Andreou, P., Spanos, O., Zeinalipour-Yazti, D., Samaras, G., Chrysanthis, P.: Fsort: External sorting on flash-based sensor devices. In: ACM International Conference Proceeding Series, pp. 1–6 (2009)
Chen, F., Hou, B., Lee, R.: Internal parallelism of flash memory-based solid-state drives. ACM Trans. Storage (TOS) 12(3), 1–39 (2016)
Chen, F., Lee, R., Zhang, X.: Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In: 2011 IEEE 17Th International Symposium on High Performance Computer Architecture, pp. 266–277. IEEE (2011)
Chen Yubiao Li Jianzhong, L.Y.L.F.G.H.: R-tree optimization method using internal parallelism of flash memory-based solid-state drives. Journal of Computer Research and Development 55(9), 2066 (2018)
Cossentine, T., Lawrence, R.: Efficient external sorting on flash memory embedded devices. International Journal of Database Management Systems 5(1), 1 (2013)
Council, T.P.P.: Tpc-h benchmark specification. [Online]. Avaliable: http://www.tpc.org/tpch/ (Accessed on: Jun. 2019)
Graefe, G.: Implementing sorting in database systems. ACM Computing Surveys (CSUR) 38(3), 10–es (2006)
Groppe, S., Groppe, J.: External sorting for index construction of large semantic web databases. In: S.Y. Shin, S. Ossowski, M. Schumacher, M.J. Palakal, C.C. Hung (eds.) Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, pp. 1373–1380. ACM (2010)
Hu, Y., Jiang, H., Feng, D., Tian, L., Luo, H., Ren, C.: Exploring and exploiting the multilevel parallelism inside ssds for improved performance and endurance. IEEE Trans. Comput. 62(6), 1141–1155 (2012)
Jackson, R., Lawrence, R.: Faster sorting for flash memory embedded devices. In: 2019 IEEE Canadian Conference of Electrical and Computer Engineering, pp. 1–5. IEEE (2019)
Laga, A., Boukhobza, J., Singhoff, F., Koskas, M.: Montres: merge on-the-run external sorting algorithm for large data volumes on ssd based storage systems. IEEE Trans. Comput. 66(10), 1689–1702 (2017)
Lee, J., Roh, H., Park, S.: External mergesort for flash-based solid state drives. IEEE Trans. Comput. 65(5), 1518–1527 (2015)
Li, H., Hao, M., Tong, M.H., Sundararaman, S., Bjørling, M., Gunawi, H.S.: The case of femu: Cheap, accurate, scalable and extensible flash emulator. In: Proceedings of 16th USENIX Conference on File and Storage Technologies (FAST). Oakland, CA (2018)
Liu, Y., He, Z., Chen, Y.P.P., Nguyen, T.: External sorting on flash memory via natural page run generation. Comput. J. 54(11), 1882–1990 (2011)
Park, H., Shim, K.: Fast: Flash-aware external sorting for mobile database systems. J. Syst. Softw. 82(8), 1298–1312 (2009)
Park, S., Seo, E., Shin, J., Maeng, S., Lee, J.: Exploiting internal parallelism of flash-based ssds. IEEE Comput. Archit. Lett. 9(1), 9–12 (2010)
Roh, H., Park, S., Kim, S., Shin, M., Lee, S.: B+-tree index optimization by exploiting internal parallelism of flash-based solid state drives. Proc. VLDB Endowment 5(4), 286–297 (2011)
Wang, P., Sun, G., Jiang, S., Ouyang, J., Lin, S., Zhang, C., Cong, J.: An efficient design and implementation of lsm-tree based key-value store on open-channel ssd. In: Proceedings of the Ninth European Conference on Computer Systems, pp. 1–14 (2014)
Wen-Yu, F.Y.L.L., Xiao-Feng, M.: Database table scan and aggregation by exploiting internal parallelism of ssds. Chinese J. Comput. 35(11), 2327–2336 (2012)
Wu, C.H., Huang, K.Y.: Data sorting in flash memory. ACM Trans. Storage (TOS) 11(2), 1–25 (2015)
Yang, J., Fung, G.P.C., Lu, W., Zhou, X., Chen, H., Du, X.: Finding superior skyline points for multidimensional recommendation applications. World Wide Web 15(1), 33–60 (2012)
Zhang, J., Shu, J., Lu, Y.: Parafs: a Log-Structured File System to Exploit the Internal Parallelism of Flash Devices. In: Proc. of the 14Th USENIX Conf on File and Storage Technologies, pp. 87–100 (2016)
Zheng, D., Mhembere, D., Burns, R.C., Vogelstein, J.T., Priebe, C.E., Szalay, A.S.: Flashgraph: Processing billion-node graphs on an array of commodity ssds. In: Schindler J., Zadok E. (eds.) Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, February 16-19, USENIX Association (2015), pp 45–58. Santa Clara, CA, USA, (2015)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. U1811461, No. 61832003, and No. 61732003).
Funding
This work was supported by the National Natural Science Foundation of China (No. U1811461, No. 61832003, and No. 61732003).
Author information
Authors and Affiliations
Contributions
Yubiao Chen proposed the original idea, and was responsible for the experiments and paper writing. Jianzhong Li helped to organize the introduction writing and the experiment designing. Hong Gao helped to check the logic of the whole paper and gave suggestions for revision.
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
The authors declare that they have no conflict of interest.
-
The authors have no relevant financial or non-financial interests to disclose.
-
The authors have no conflicts of interest to declare that are relevant to the content of this article.
-
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
-
The authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Availability of data and material
TPC-H is a standard benchmark. It can be downloaded from offcial website [8].
Code Availability
Source code are avaliable at: https://github.com/cyb3727/SortDecision1.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Y., Li, J. & Gao, H. Finding the optimal execution scheme of external mergesort on solid state drives. World Wide Web 24, 781–804 (2021). https://doi.org/10.1007/s11280-021-00872-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00872-9