Accelerating Spark-Based Applications with MPI and OpenACC,Complexity

当前位置： X-MOL 学术 › Complexity › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accelerating Spark-Based Applications with MPI and OpenACC
Complexity ( IF 1.7 ) Pub Date : 2021-07-21 , DOI: 10.1155/2021/9943289
Saeed Alshahrani ₁ , Waleed Al Shehri ₂ , Jameel Almalki ₂ , Ahmed M. Alghamdi ₃ , Abdullah M. Alammari ₄

Affiliation

The amount of data produced in scientific and commercial fields is growing dramatically. Correspondingly, big data technologies, such as Hadoop and Spark, have emerged to tackle the challenges of collecting, processing, and storing such large-scale data. Unfortunately, big data applications usually have performance issues and do not fully exploit a hardware infrastructure. One reason is that applications are developed using high-level programming languages that do not provide low-level system control in terms of performance of highly parallel programming models like message passing interface (MPI). Moreover, big data is considered a barrier of parallel programming models or accelerators (e.g., CUDA and OpenCL). Therefore, the aim of this study is to investigate how the performance of big data applications can be enhanced without sacrificing the power consumption of a hardware infrastructure. A Hybrid Spark MPI OpenACC (HSMO) system is proposed for integrating Spark as a big data programming model, with MPI and OpenACC as parallel programming models. Such integration brings together the advantages of each programming model and provides greater effectiveness. To enhance performance without sacrificing power consumption, the integration approach needs to exploit the hardware infrastructure in an intelligent manner. For achieving this performance enhancement, a mapping technique is proposed that is built based on the application’s virtual topology as well as the physical topology of the undelaying resources. To the best of our knowledge, there is no existing method in big data applications related to utilizing graphics processing units (GPUs), which are now an essential part of high-performance computing (HPC) as a powerful resource for fast computation.

中文翻译：

使用 MPI 和 OpenACC 加速基于 Spark 的应用程序

科学和商业领域产生的数据量正在急剧增长。相应地，Hadoop、Spark等大数据技术应运而生，以应对海量数据的采集、处理和存储挑战。不幸的是，大数据应用程序通常存在性能问题并且不能充分利用硬件基础设施。一个原因是应用程序是使用高级编程语言开发的，这些语言在高度并行编程模型（如消息传递接口 (MPI)）的性能方面不提供低级系统控制。此外，大数据被认为是并行编程模型或加速器（例如，CUDA 和 OpenCL）的障碍。所以，本研究的目的是研究如何在不牺牲硬件基础设施功耗的情况下提高大数据应用程序的性能。提出了一种混合 Spark MPI OpenACC (HSMO) 系统，用于集成 Spark 作为大数据编程模型，MPI 和 OpenACC 作为并行编程模型。这种集成汇集了每种编程模型的优点并提供了更大的效率。为了在不牺牲功耗的情况下提高性能，集成方法需要以智能方式利用硬件基础设施。为了实现这种性能增强，提出了一种基于应用程序的虚拟拓扑以及未延迟资源的物理拓扑构建的映射技术。据我们所知，

更新日期：2021-07-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11