当前位置: X-MOL 学术IEEE Trans. Cloud Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Splitting Large Medical Data Sets based on Normal Distribution in Cloud Environment
IEEE Transactions on Cloud Computing ( IF 5.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcc.2015.2462361
Hao Lan Zhang , Yali Zhao , Chaoyi Pang , Jinyuan He

The surge of medical and e-commerce applications has generated tremendous amount of data, which brings people to a so-called “Big Data” era. Different from traditional large data sets, the term “Big Data” not only means the large size of data volume but also indicates the high velocity of data generation. However, current data mining and analytical techniques are facing the challenge of dealing with large volume data in a short period of time. This paper explores the efficiency of utilizing the Normal Distribution (ND) method for splitting and processing large volume medical data in cloud environment, which can provide representative information in the split data sets. The ND-based new model consists of two stages. The first stage adopts the ND method for large data sets splitting and processing, which can reduce the volume of data sets. The second stage implements the ND-based model in a cloud computing infrastructure for allocating the split data sets. The experimental results show substantial efficiency gains of the proposed method over the conventional methods without splitting data into small partitions. The ND-based method can generate representative data sets, which can offer efficient solution for large data processing. The split data sets can be processed in parallel in Cloud computing environment.

中文翻译:

云环境下基于正态分布的大型医学数据集拆分

医疗和电子商务应用的激增产生了海量数据,将人们带入了所谓的“大数据”时代。与传统的大数据集不同,“大数据”一词不仅意味着数据量大,还意味着数据生成速度快。然而,当前的数据挖掘和分析技术面临着在短时间内处理大量数据的挑战。本文探讨了利用正态分布 (ND) 方法在云环境中拆分和处理大量医疗数据的效率,该方法可以在拆分的数据集中提供具有代表性的信息。基于ND的新模型包括两个阶段。第一阶段采用ND方法对大数据集进行拆分和处理,可以减少数据集的体积。第二阶段在云计算基础设施中实现基于 ND 的模型,用于分配拆分数据集。实验结果表明,在不将数据分成小分区的情况下,所提出的方法比传统方法具有显着的效率提升。基于ND的方法可以生成具有代表性的数据集,可以为大数据处理提供有效的解决方案。拆分后的数据集可以在云计算环境中并行处理。可为大数据处理提供高效的解决方案。拆分后的数据集可以在云计算环境中并行处理。可为大数据处理提供高效的解决方案。拆分后的数据集可以在云计算环境中并行处理。
更新日期:2020-04-01
down
wechat
bug