当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Graphical Flow-based Spark Programming
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-01-08 , DOI: 10.1186/s40537-019-0273-5
Tanmaya Mahapatra , Christian Prehofer

Increased sensing data in the context of the Internet of Things (IoT) necessitates data analytics. It is challenging to write applications for Big Data systems due to complex, highly parallel software frameworks and systems. The inherent complexity in programming Big Data applications is also due to the presence of a wide range of target frameworks, with different data abstractions and APIs. The paper aims to reduce this complexity and its ensued learning curve by enabling domain experts, that are not necessarily skilled Big Data programmers, to develop data analytics applications via domain-specific graphical tools. The approach follows the flow-based programming paradigm used in IoT mashup tools. The paper contributes to these aspects by (i) providing a thorough analysis and classification of the widely used Spark framework and selecting suitable data abstractions and APIs for use in a graphical flow-based programming paradigm and (ii) devising a novel, generic approach for programming Spark from graphical flows that comprises early-stage validation and code generation of Spark applications. Use cases for Spark have been prototyped and evaluated to demonstrate code-abstraction, automatic data abstraction interconversion and automatic generation of target Spark programs, which are the keys to lower the complexity and its ensued learning curve involved in the development of Big Data applications.

中文翻译:

基于图形流的Spark编程

在物联网(IoT)的背景下,越来越多的传感数据需要进行数据分析。由于复杂,高度并行的软件框架和系统,为大数据系统编写应用程序具有挑战性。大数据应用程序编程的内在复杂性还归因于存在范围广泛的目标框架以及不同的数据抽象和API。本文旨在通过使不一定是熟练的大数据程序员的领域专家能够通过特定于领域的图形工具开发数据分析应用程序,从而降低这种复杂性及其随之而来的学习曲线。该方法遵循物联网mashup工具中使用的基于流的编程范例。本文通过(i)对广泛使用的Spark框架进行彻底的分析和分类,并选择合适的数据抽象和API来用于基于图形流的编程范例中,以及(ii)设计一种新颖的通用方法来为这些方面做出贡献。通过图形化流程对Spark进行编程,其中包括Spark应用程序的早期验证和代码生成。Spark的用例已经过原型设计和评估,以演示代码抽象,自动数据抽象互转换和目标Spark程序的自动生成,这些是降低大数据应用程序开发的复杂性及其随之而来的学习曲线的关键。从图形流对Spark进行编程的通用方法,包括Spark应用程序的早期验证和代码生成。Spark的用例已经过原型设计和评估,以演示代码抽象,自动数据抽象互转换和目标Spark程序的自动生成,这些是降低大数据应用程序开发的复杂性及其随之而来的学习曲线的关键。从图形流对Spark进行编程的通用方法,包括Spark应用程序的早期验证和代码生成。已对Spark用例进行原型设计和评估,以演示代码抽象,自动数据抽象互转换和目标Spark程序的自动生成,这些是降低复杂性及其在大数据应用程序开发中随之而来的学习曲线的关键。
更新日期:2020-01-08
down
wechat
bug