当前位置: X-MOL 学术J. Internet Serv. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Upgrading a high performance computing environment for massive data processing
Journal of Internet Services and Applications Pub Date : 2019-10-16 , DOI: 10.1186/s13174-019-0118-7
Lucas M. Ponce , Walter dos Santos , Wagner Meira , Dorgival Guedes , Daniele Lezzi , Rosa M. Badia

High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.

中文翻译:

升级高性能计算环境以进行海量数据处理

高性能计算(HPC)和海量数据处理(Big Data)是开始融合的两个趋势。在此过程中,将从两个角度重新审视硬件体系结构,系统支持和编程范例的各个方面。本文介绍了我们在这种融合路径上的经验,并提出了一个框架提案,该框架解决了从这种集成衍生出的一些编程问题。我们的贡献是开发集成了以下内容的集成环境:(i)COMPS,这是一种用于开发和执行分布式基础结构的并行应用程序的编程框架;ii Lemonade,一种数据挖掘和分析工具;(iii)HDFS,是大数据系统中使用最广泛的分布式文件系统。为了验证我们的框架,我们使用Lemonade创建了可通过HDFS访问数据的COMPSs应用程序,并将它们与使用流行的大数据框架Spark构建的等效应用程序进行了比较。结果表明,HDFS集成通过简化数据访问和重新安排数据传输,减少执行时间而使COMPS受益。与Lemonade的集成可以为希望开发具有更高抽象级别的应用程序的数据领域专家提供高效的算法实现,从而促进COMPSs的使用,并可能有助于COMPSs在数据科学界的普及。
更新日期:2019-10-16
down
wechat
bug