当前位置: X-MOL 学术IEEE Trans. Cloud Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-cloud MapReduce for Big Data
IEEE Transactions on Cloud Computing ( IF 5.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcc.2015.2474385
Peng Li , Song Guo , Shui Yu , Weihua Zhuang

MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable subproblems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works.

中文翻译:

大数据跨云MapReduce

MapReduce 作为大数据分析的领先框架发挥着关键作用。在本文中,我们考虑了一种地理分布式云架构,该架构基于从全球最终用户收集的大数据提供 MapReduce 服务。现有工作通过传统的以计算为中心的方法处理 MapReduce 作业,该方法将分布在多个云中的所有输入数据聚合到驻留在单个云中的虚拟集群中。其低效率和高成本的大数据支持促使我们提出了一种新的以数据为中心的架构,其中包含三个关键技术,即跨云虚拟集群、以数据为中心的作业安置和基于网络编码的流量路由。我们的设计导致了一个优化框架,其目标是最小化在地理分布式云中运行一组 MapReduce 作业的计算和传输成本。我们进一步设计了一种并行算法,将原始的大规模问题分解为几个可分布式解决的子问题,这些子问题由一个高级主问题协调。最后,我们进行了真实世界的实验和广泛的模拟,以表明我们的提议明显优于现有工作。
更新日期:2020-04-01
down
wechat
bug