Near-data Prediction Based Speculative Optimization in a Distribution Environment,Mobile Networks and Applications

当前位置： X-MOL 学术 › Mobile Netw. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Near-data Prediction Based Speculative Optimization in a Distribution Environment
Mobile Networks and Applications ( IF 2.3 ) Pub Date : 2022-06-18 , DOI: 10.1007/s11036-021-01793-7
Qi Liu , Xueyan Wu , Xiaodong Liu , Yonghong Zhang , Yuemei Hu

Hadoop is an open source from Apache with a distributed file system and MapReduce distributed computing framework. The current Apache 2.0 license agreement supports on-demand payment by consumers for cloud platform services, helping users leverage their respective different hardware to provides cloud services. In cloud-based environment, there is a need to balance the resource requirements of workloads, optimize load performance, and the cloud compute costs to manage. When the processing power of clustered machines varies widely, such as when hardware is aging or overloaded, Hadoop offers a speculative execution (SE) optimization strategy, by monitoring task progress in real time, in the starting identical backup tasks on different nodes when multiple tasks under a job are not running at the same speed, providing the first to go. The completed calculations maintain the overall progress of the job. At present, the SE strategy’s incorrect selection of backup nodes and resource constraints may result in poor Hadoop performance, and subsequent tasks cannot be completed execution and other problems. This paper proposes an SE optimization strategy based on near data prediction, which analyzes the prediction of real-time task execution information to predict the required running time, select backup nodes based on actual requirements and approximate data to make the SE strategy achieve the best performance. Experiments prove that in a heterogeneous Hadoop environment, the optimization strategy can effectively improve the effectiveness and accuracy of various tasks and enhance the performance of cloud computing. Platform performance can benefits consumers better than before.

中文翻译：

分布环境中基于近似数据预测的推测优化

Hadoop 是来自 Apache 的开源，具有分布式文件系统和 MapReduce 分布式计算框架。当前的Apache 2.0许可协议支持消费者按需付费购买云平台服务，帮助用户利用各自不同的硬件提供云服务。在基于云的环境中，需要平衡工作负载的资源需求、优化负载性能和管理云计算成本。当集群机器的处理能力差异很大时，例如硬件老化或过载时，Hadoop提供了一种推测执行（SE）优化策略，通过实时监控任务进度，在多个任务时在不同节点上启动相同的备份任务下一个job都运行速度不一样，提供先走。完成的计算保持工作的整体进度。目前SE策略选择备份节点和资源限制不正确，可能会导致Hadoop性能不佳，后续任务无法完成执行等问题。本文提出了一种基于近数据预测的SE优化策略，分析实时任务执行信息的预测以预测所需的运行时间，根据实际需求和近似数据选择备份节点，使SE策略达到最佳性能。 . 实验证明，在异构 Hadoop 环境下，该优化策略可以有效提高各种任务的有效性和准确性，提升云计算的性能。平台性能可以比以前更好地使消费者受益。

更新日期：2022-06-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文