当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DDF Library: Enabling functional programming in a task-based model
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2021-02-14 , DOI: 10.1016/j.jpdc.2021.02.009
Lucas M. Ponce , Daniele Lezzi , Rosa M. Badia , Dorgival Guedes

In recent years, the areas of High-Performance Computing (HPC) and massive data processing (also know as Big Data) have been in a convergence course, since they tend to be deployed on similar hardware. HPC systems have historically performed well in regular, matrix-based computations; on the other hand, Big Data problems have often excelled in fine-grained, data parallel workloads. While HPC programming is mostly task-based, like COMPSs, popular Big Data environments, like Spark, adopt the functional programming paradigm. A careful analysis shows that there are pros and cons to both approaches, and integrating them may yield interesting results. With that reasoning in mind, we have developed DDF, an API and library for COMPSs that allows developers to use Big Data techniques while using that HPC environment. DDF has a functional-based interface, similar to many Data Science tools, that allows us to use dynamic evaluation to adapt the task execution in run time. It brings some of the qualities of Big Data programming, making it easier for application domain experts to write Data Analysis jobs. In this article we discuss the API and evaluate the impact of the techniques used in its implementation that allow a more efficient COMPSs execution. In addition, we present a performance comparison with Spark in several application patterns. The results show that each technique significantly impacts the performance, allowing COMPSs to outperform Spark in many use cases.



中文翻译:

DDF库:在基于任务的模型中启用功能编程

近年来,高性能计算(HPC)和海量数据处理(也称为大数据)领域一直处于融合过程中,因为它们倾向于部署在相似的硬件上。历史上,HPC系统在基于矩阵的常规计算中表现良好;另一方面,大数据问题通常在细粒度的数据并行工作负载中表现出色。尽管HPC编程主要基于任务(例如COMPS),但流行的大数据环境(例如Spark)采用功能性编程范例。仔细的分析表明,这两种方法都各有利弊,将它们整合可能会产生有趣的结果。考虑到这种推理,我们开发了DDF,COMPS的API和库,允许开发人员在使用该HPC环境时使用大数据技术。DDF具有基于功能的界面,与许多数据科学工具相似,它使我们能够使用动态评估来在运行时适应任务执行。它带来了大数据编程的一些特质,使应用程序领域的专家更容易编写数据分析作业。在本文中,我们讨论了API并评估了其实现中使用的技术的影响,这些技术可实现更高效的COMPS执行。此外,我们在几个应用程序模式中与Spark进行了性能比较。结果表明,每种技术都会显着影响性能,从而使COMPS在许多用例中均胜过Spark。使应用程序领域的专家更容易编写数据分析作业。在本文中,我们讨论了API并评估了其实现中使用的技术的影响,这些技术可实现更高效的COMPS执行。此外,我们在几个应用程序模式中与Spark进行了性能比较。结果表明,每种技术都会显着影响性能,从而使COMPS在许多用例中均胜过Spark。使应用程序领域的专家更容易编写数据分析作业。在本文中,我们讨论了API并评估了其实现中使用的技术的影响,这些技术可实现更高效的COMPS执行。此外,我们在几个应用程序模式中与Spark进行了性能比较。结果表明,每种技术都会显着影响性能,从而使COMPS在许多用例中均胜过Spark。

更新日期:2021-02-15
down
wechat
bug