A Task-Aware Fine-Grained Storage Selection Mechanism for In-Memory Big Data Computing Frameworks,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Task-Aware Fine-Grained Storage Selection Mechanism for In-Memory Big Data Computing Frameworks
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2020-06-05 , DOI: 10.1007/s10766-020-00662-2
Bo Wang , Jie Tang , Rui Zhang , Jialei Liu , Shaoshan Liu , Deyu Qi

In-memory big data computing, widely used in hot areas such as deep learning and artificial intelligence, can meet the demands of ultra-low latency service and real-time data analysis. However, existing in-memory computing frameworks usually use memory in an aggressive way. Memory space is quickly exhausted and leads to great performance degradation or even task failure. On the other hand, the increasing volumes of raw data and intermediate data introduce huge memory demands, which further deteriorate the short of memory. To release the pressure on memory, those in-memory frameworks provide various storage schemes options for caching data, which determines where and how data is cached. But their storage scheme selection mechanisms are simple and insufficient, always manually set by users. Besides, those coarse-grained data storage mechanisms cannot satisfy memory access patterns of each computing unit which works on only part of the data. In this paper, we proposed a novel task-aware fine-grained storage scheme auto-selection mechanism. It automatically determines the storage scheme for caching each data block, which is the smallest unit during computing. The caching decision is made by considering the future tasks, real-time resource utilization, and storage costs, including block creation costs, I/O costs, and serialization costs under each storage scenario. The experiments show that our proposed mechanism, compared with the default storage setting, can offer great performance improvement, especially in memory-constrained circumstances it can be as much as 78%.

中文翻译：

内存大数据计算框架的任务感知细粒度存储选择机制

内存大数据计算广泛应用于深度学习、人工智能等热点领域，可满足超低延迟服务和实时数据分析的需求。然而，现有的内存计算框架通常以激进的方式使用内存。内存空间很快耗尽，导致性能大幅下降甚至任务失败。另一方面，不断增加的原始数据和中间数据带来了巨大的内存需求，这进一步加剧了内存短缺。为了释放内存压力，那些内存框架提供了各种缓存数据的存储方案选项，这决定了数据缓存的位置和方式。但它们的存储方案选择机制简单且不足，总是由用户手动设置。除了，那些粗粒度的数据存储机制不能满足每个计算单元只处理部分数据的内存访问模式。在本文中，我们提出了一种新颖的任务感知细粒度存储方案自动选择机制。它自动确定缓存每个数据块的存储方案，这是计算过程中的最小单位。缓存决策是通过考虑未来任务、实时资源利用率和存储成本做出的，包括每个存储场景下的块创建成本、I/O成本和序列化成本。实验表明，与默认存储设置相比，我们提出的机制可以提供很大的性能提升，尤其是在内存受限的情况下，它可以达到 78%。

更新日期：2020-06-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11