当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-09-11 , DOI: 10.1016/j.future.2020.09.009
Saeed Mirpour Marzuni , Abdorreza Savadi , Adel N. Toosi , Mahmoud Naghibzadeh

The MapReduce model is widely used to store and process big data in a distributed manner. MapReduce was originally developed for a single tightly coupled cluster of computers. Approaches such as Hierarchical and Geo-Hadoop are designed to address geo-distributed MapReduce processing. However, these methods still suffer from high inter-cluster data transfer over the Internet, which is prohibitive for processing today’s globally big data. In line with our thinking that there is no need to transfer the entire intermediate results to a single global reducer, we propose Cross-MapReduce, a framework for geo-distributed MapReduce processing. Before any massive data transfer, our proposed method finds a set of best global reducers to minimize transferred data volumes. We propose a graph called (GRG) to determine the number and the locations of the global reducers. We conducted extensive experimental evaluations using a real testbed to demonstrate the effectiveness of Cross-MapReduce. The experimental results show that Cross-MapReduce significantly outperforms the Hierarchical and Geo-Hadoop approaches and reduces the amount of data transfer over the Internet by 40%.

中文翻译:

Cross-MapReduce:地理分布式 MapReduce 中的数据传输减少

MapReduce模型广泛用于分布式存储和处理大数据。 MapReduce 最初是为单个紧密耦合的计算机集群开发的。分层和 Geo-Hadoop 等方法旨在解决地理分布式 MapReduce 处理问题。然而,这些方法仍然受到互联网上高集群间数据传输的影响,这对于处理当今的全球大数据来说是令人望而却步的。根据我们不需要将整个中间结果传输到单个全局减速器的想法,我们提出了 Cross-MapReduce,一个用于地理分布式 MapReduce 处理的框架。在进行任何大规模数据传输之前,我们提出的方法会找到一组最佳的全局减速器,以最大程度地减少传输的数据量。我们提出了一个名为(GRG)的图来确定全局减速器的数量和位置。我们使用真实的测试平台进行了广泛的实验评估,以证明 Cross-MapReduce 的有效性。实验结果表明,Cross-MapReduce 显着优于 Hierarchical 和 Geo-Hadoop 方法,并将通过互联网传输的数据量减少了 40%。
更新日期:2020-09-11
down
wechat
bug