当前位置: X-MOL 学术Adv. Eng. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cloud-agnostic architectures for machine learning based on Apache Spark
Advances in Engineering Software ( IF 4.8 ) Pub Date : 2021-06-05 , DOI: 10.1016/j.advengsoft.2021.103029
Enikő Nagy , Róbert Lovas , István Pintye , Ákos Hajnal , Péter Kacsuk

Reference architectures for Big Data, machine learning and stream processing include not only recommended practices and interconnected building blocks but considerations for scalability, availability, manageability, and security as well. However, the automated deployment of multi-VM platforms on various clouds leveraging on such reference architectures may raise several issues. The paper focuses particularly on the widespread Apache Spark Big Data platform as the baseline and the Occopus cloud-agnostic orchestrator tool. The set of new generation reference architectures are configurable by human-readable descriptors according to available resources and cloud-providers, and offers various components such as Jupyter Notebook, RStudio, HDFS, and Kafka. These pre-configured reference architectures can be automatically deployed even by the data scientist on-demand, using a multi-cloud approach for a wide range of cloud systems like Amazon AWS, Microsoft Azure, OpenStack, OpenNebula, CloudSigma, etc. Occopus enables the scaling of cluster-oriented components (such as Spark) of the instantiated reference architectures. The presented solution was successfully used in the Hungarian Comparative Agendas Project (CAP) by the Institute for Political Science to classify newspaper articles.



中文翻译:

基于 Apache Spark 的机器学习云无关架构

大数据、机器学习和流处理的参考架构不仅包括推荐的实践和互连的构建块,还包括对可扩展性、可用性、可管理性和安全性的考虑。然而,利用此类参考架构在各种云上自动部署多 VM 平台可能会引发几个问题。本文特别关注广泛使用的 Apache Spark 大数据平台作为基线和 Occopus 与云无关的编排工具。新一代参考架构集可根据可用资源和云提供商通过人类可读的描述符进行配置,并提供各种组件,例如 Jupyter Notebook、RStudio、HDFS 和 Kafka。这些预先配置的参考架构甚至可以由数据科学家按需自动部署,使用多云方法适用于各种云系统,如 Amazon AWS、Microsoft Azure、OpenStack、OpenNebula、CloudSigma 等。 Occopus 支持实例化参考架构的面向集群的组件(如 Spark)的扩展。所提出的解决方案已成功用于匈牙利政治科学研究所的匈牙利比较议程项目 (CAP),以对报纸文章进行分类。

更新日期:2021-06-07
down
wechat
bug