当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Big data and machine learning framework for clouds and its usage for text classification
Concurrency and Computation: Practice and Experience ( IF 1.5 ) Pub Date : 2020-12-21 , DOI: 10.1002/cpe.6164
István Pintye 1 , Eszter Kail 1 , Péter Kacsuk 1, 2 , Róbert Lovas 1, 3
Affiliation  

Reference architectures for big data and machine learning include not only interconnected building blocks but important considerations (among others) for scalability, manageability and usability issues as well. Leveraging on such reference architectures, the automated deployment of distributed toolsets and frameworks on various clouds is still challenging due to the diversity of technologies and protocols. The paper focuses particularly on the widespread Apache Spark cluster with Jupyter as the particularly addressed framework, and the Occopus cloud-agnostic orchestrator tool for automating its deployment and maintenance stages. The presented approach has been demonstrated and validated with a new, promising text classification application on the Hungarian academic research infrastructure, the OpenStack-based MTA Cloud. The paper explains the concept, the applied components, and illustrates their usage with real use-case measurements.

中文翻译:

云的大数据和机器学习框架及其在文本分类中的应用

大数据和机器学习的参考架构不仅包括互连的构建块,还包括可扩展性、可管理性和可用性问题的重要考虑因素(以及其他)。利用此类参考架构,由于技术和协议的多样性,在各种云上自动部署分布式工具集和框架仍然具有挑战性。本文特别关注广泛使用的 Apache Spark 集群,其中 Jupyter 作为特别解决的框架,以及用于自动化其部署和维护阶段的 Occopus 与云无关的协调器工具。所提出的方法已通过匈牙利学术研究基础设施(基于 ​​OpenStack 的 MTA 云)上的一个新的、有前途的文本分类应用程序得到证明和验证。
更新日期:2020-12-21
down
wechat
bug