A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2019-04-09 , DOI: 10.1007/s10766-018-0612-8
Bo Wang , Jie Tang , Rui Zhang , Wei Ding , Deyu Qi

Artificial intelligence applications that greatly depend on deep learning and compute vision processing becomes popular. Their strong demands for low-latency or real-time services make Spark, an in-memory big data computing framework, the best choice in taking place of previous disk-based big data computing. As an in-memory framework, reasonable data arrangement in storage is the key factor of performance. However, the existing cache replacement strategy and storage selection mechanism based optimizations all rely on an imprecise available memory model and will lead to negative decision. To address this issue, we propose an available memory model to capture the accurate information of to be freed memory space by sensing the dependencies between the data. And we also propose a maximum memory requirement model for execution prediction to exclude the redundancy from inactive blocks. With such two models, we build DASS, a dependency-aware storage selection mechanism for Spark to make dynamic and fine-grained storage decision. Our experiments show that compared with previous methods the DASS could effectively reduce the cost of garbage collection and RDD blocks re-computing, give better computing performance by 77.4%.

中文翻译：

内存大数据计算框架的依赖感知存储模式选择机制

高度依赖深度学习和计算视觉处理的人工智能应用变得流行。他们对低延迟或实时服务的强烈需求，使得Spark这一内存大数据计算框架成为替代以往基于磁盘的大数据计算的最佳选择。作为一个内存框架，在存储中合理的数据排列是性能的关键因素。然而，现有的缓存替换策略和基于存储选择机制的优化都依赖于不精确的可用内存模型，并且会导致负面决策。为了解决这个问题，我们提出了一种可用的内存模型，通过感知数据之间的依赖关系来捕获要释放的内存空间的准确信息。我们还提出了一个用于执行预测的最大内存需求模型，以排除非活动块的冗余。使用这两个模型，我们构建了 DASS，一种依赖感知的存储选择机制，用于 Spark 做出动态和细粒度的存储决策。我们的实验表明，与之前的方法相比，DASS 可以有效降低垃圾收集和 RDD 块重新计算的成本，使计算性能提高 77.4%。

更新日期：2019-04-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11