当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effect of garbage collection in iterative algorithms on Spark: an experimental analysis
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2020-01-16 , DOI: 10.1007/s11227-020-03150-z
Minseo Kang , Jae-Gil Lee

Spark is one of the most widely used systems for the distributed processing of big data. Its performance bottlenecks are mainly due to the network I/O, disk I/O, and garbage collection. Previous studies quantitatively analyzed the performance impact of these bottlenecks but did not focus on iterative algorithms. In an iterative algorithm, garbage collection has more performance impact than other workloads because the algorithm repeatedly loads and deletes data in the main memory through multiple iterations. Spark provides three caching mechanisms which are “ disk cache ,” “ memory cache ,” and “ no cache ” to keep the unchanged data across iterations. In this paper, we provide an in-depth experimental analysis of the effect of garbage collection on the overall performance depending on the caching mechanisms of Spark with various combinations of algorithms and datasets. The experimental results show that garbage collection accounts for 16–47% of the total elapsed time of running iterative algorithms on Spark and that the memory cache is no less advantageous in terms of garbage collection than the disk cache . We expect the results of this paper to serve as a guide for the tuning of garbage collection in the running of iterative algorithms on Spark.

中文翻译:

Spark迭代算法中垃圾回收的影响:实验分析

Spark 是应用最广泛的大数据分布式处理系统之一。其性能瓶颈主要来自网络I/O、磁盘I/O和垃圾回收。之前的研究定量分析了这些瓶颈对性能的影响,但并未关注迭代算法。在迭代算法中,垃圾收集比其他工作负载对性能的影响更大,因为该算法通过多次迭代重复加载和删除主内存中的数据。Spark 提供了三种缓存机制,分别是“磁盘缓存”、“内存缓存”和“无缓存”,以保持迭代中不变的数据。在本文中,我们根据 Spark 的缓存机制以及算法和数据集的各种组合,对垃圾收集对整体性能的影响进行了深入的实验分析。实验结果表明,在 Spark 上运行迭代算法的总耗用时间中,垃圾回收占 16-47%,内存缓存在垃圾回收方面的优势不亚于磁盘缓存。我们希望本文的结果可以作为在 Spark 上运行迭代算法时调整垃圾收集的指南。
更新日期:2020-01-16
down
wechat
bug