Segmented In-Advance Data Analytics for Fast Scientific Discovery,IEEE Transactions on Cloud Computing

当前位置： X-MOL 学术 › IEEE Trans. Cloud Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Segmented In-Advance Data Analytics for Fast Scientific Discovery
IEEE Transactions on Cloud Computing ( IF 6.5 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcc.2016.2541142
Jialin Liu , Yong Chen

Scientific discovery usually involves data generation, data preprocessing, data storage and data analysis. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens during each cycle of the scientific discovery, continues to be the bottleneck in most scientific big data applications. A lot of research works have been conducted on reducing the data movement. Among the existing efforts and based on our previous research, reusing the analysis results shows a significant potential in optimizing the data movement between analysis operations. In this work, we propose the Segmented In-Advance (SIA) data analytics approach for optimizing the data movement and we also provide a cloud-based elastic distributed in-memory database to manage the intermediate analysis results. The fundamental idea of this Segmented In-Advance approach is to analyze the history operations and to predict the future interesting analytics operations. The predicted analysis operation is in-advance performed on the finer segmented dataset and the segmented results are distributed in an in-memory key-value store for future reuse. The evaluation shows that the segmented in-advance data analytics approach achieves 1.2X-6.1X speedup. The evaluation also shows a good scalability of the in-memory distributed data store. The proposed Segmented In-Advance data analytics approach is a promising data movement reduction solution for scientific big data applications and fast scientific discovery.

中文翻译：

用于快速科学发现的分段提前数据分析

科学发现通常涉及数据生成、数据预处理、数据存储和数据分析。由于在单次模拟运行中数据量超过数 TB，因此在科学发现的每个周期中发生的数据移动仍然是大多数科学大数据应用的瓶颈。已经进行了大量的研究工作来减少数据移动。在现有的努力中，根据我们之前的研究，重用分析结果显示了优化分析操作之间的数据移动的巨大潜力。在这项工作中，我们提出了用于优化数据移动的 Segmented In-Advance (SIA) 数据分析方法，并且我们还提供了一个基于云的弹性分布式内存数据库来管理中间分析结果。这种 Segmented In-Advance 方法的基本思想是分析历史操作并预测未来有趣的分析操作。预测分析操作是在更精细的分段数据集上提前执行的，分段结果分布在内存中的键值存储中以备将来重用。评估表明，分段的提前数据分析方法实现了 1.2X-6.1X 的加速。评估还显示内存分布式数据存储具有良好的可扩展性。提议的分段提前数据分析方法是一种用于科学大数据应用和快速科学发现的有前途的数据移动减少解决方案。预测分析操作是在更精细的分段数据集上提前执行的，分段结果分布在内存中的键值存储中以备将来重用。评估表明，分段的提前数据分析方法实现了 1.2X-6.1X 的加速。评估还表明内存分布式数据存储具有良好的可扩展性。提议的分段提前数据分析方法是一种用于科学大数据应用和快速科学发现的有前途的数据移动减少解决方案。预测分析操作是在更精细的分段数据集上提前执行的，分段结果分布在内存中的键值存储中以备将来重用。评估表明，分段的提前数据分析方法实现了 1.2X-6.1X 的加速。评估还显示内存分布式数据存储具有良好的可扩展性。提议的分段提前数据分析方法是一种用于科学大数据应用和快速科学发现的有前途的数据移动减少解决方案。评估还显示内存分布式数据存储具有良好的可扩展性。提议的分段提前数据分析方法是一种用于科学大数据应用和快速科学发现的有前途的数据移动减少解决方案。评估还显示内存分布式数据存储具有良好的可扩展性。提议的分段提前数据分析方法是一种用于科学大数据应用和快速科学发现的有前途的数据移动减少解决方案。

更新日期：2020-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>