当前位置: X-MOL 学术PeerJ Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Correct and stable sorting for overflow streaming data with a limited storage size and a uniprocessor
PeerJ Computer Science ( IF 3.8 ) Pub Date : 2021-02-12 , DOI: 10.7717/peerj-cs.355
Suluk Chaikhan , Suphakant Phimoltares , Chidchanok Lursinsap

Tremendous quantities of numeric data have been generated as streams in various cyber ecosystems. Sorting is one of the most fundamental operations to gain knowledge from data. However, due to size restrictions of data storage which includes storage inside and outside CPU with respect to the massive streaming data sources, data can obviously overflow the storage. Consequently, all classic sorting algorithms of the past are incapable of obtaining a correct sorted sequence because data to be sorted cannot be totally stored in the data storage. This paper proposes a new sorting algorithm called streaming data sort for streaming data on a uniprocessor constrained by a limited storage size and the correctness of the sorted order. Data continuously flow into the storage as consecutive chunks with chunk sizes less than the storage size. A theoretical analysis of the space bound and the time complexity is provided. The sorting time complexity is O (n), where n is the number of incoming data. The space complexity is O (M), where M is the storage size. The experimental results show that streaming data sort can handle a million permuted data by using a storage whose size is set as low as 35% of the data size. This proposed concept can be practically applied to various applications in different fields where the data always overflow the working storage and sorting process is needed.

中文翻译:

正确和稳定的排序,用于有限存储空间和单处理器的溢出流数据

在各种网络生态系统中,已经生成了大量的数字数据作为流。排序是从数据中获取知识的最基本操作之一。但是,由于数据存储的大小限制(包括相对于海量流数据源的CPU内部和外部存储),数据显然会溢出存储。因此,过去的所有经典分类算法都无法获得正确的分类顺序,因为要分类的数据无法完全存储在数据存储器中。本文提出了一种新的排序算法,称为流数据排序,用于在有限的存储大小和排序顺序的正确性的约束下,在单处理器上流传输数据。数据作为连续的块(其大小小于存储大小)连续流入存储。提供了空间界限和时间复杂度的理论分析。排序时间复杂度为O(n),其中n是传入数据的数量。空间复杂度为O(M),其中M为存储大小。实验结果表明,通过使用大小设置为数据大小的35%的存储,流数据排序可以处理一百万个排列的数据。所提出的概念可以实际应用于不同领域的各种应用,在这些领域中数据总是溢出工作存储和排序过程。实验结果表明,通过使用大小设置为数据大小的35%的存储,流数据排序可以处理一百万个排列的数据。所提出的概念可以实际应用于不同领域的各种应用,在这些领域中数据总是溢出工作存储和排序过程。实验结果表明,通过使用大小设置为数据大小的35%的存储,流数据排序可以处理一百万个排列的数据。所提出的概念可以实际应用于不同领域的各种应用,在这些领域中数据总是溢出工作存储和排序过程。
更新日期:2021-02-12
down
wechat
bug