当前位置: X-MOL 学术Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SP-BRAIN: scalable and reliable implementations of a supervised relevance-based machine learning algorithm
Soft Computing ( IF 3.1 ) Pub Date : 2019-09-20 , DOI: 10.1007/s00500-019-04366-9
Valerio Morfino , Salvatore Rampone , Emanuel Weitschek

In this work, new implementations of the U-BRAIN (Uncertainty-managing Bach Relevance-Based Artificial Intelligence) supervised machine learning algorithm are described. The implementations, referred as SP-BRAIN (SP stands for Spark), aim to efficiently process large datasets. Given the iterative nature of the algorithm together with its dependence on in-memory data, a non-standard MapReduce paradigm is applied, taking into account several memory and performance problems, e.g., the granularity of the MAP task, the reduction in the shuffling operation, caching, partial data recomputing, and usage of clusters. The implementations benefit the whole Hadoop ecosystem components, such as HDFS, Yarn, and streaming. Testing is performed in cloud execution environments, using different configurations with up to 128 cores. The performance of the new implementations is evaluated on three known datasets, and the findings are compared to the ones of a previous U-BRAIN parallel implementation. The results show a speedup up to 20 × with a good scalability and reliability in cluster environments.



中文翻译:

SP-BRAIN:基于监督的基于关联的机器学习算法的可扩展且可靠的实现

在这项工作中,描述了U-BRAIN(基于不确定性管理巴赫相关性人工智能)监督的机器学习算法的新实现。称为SP-BRAIN(SP表示Spark)的实现旨在有效地处理大型数据集。考虑到算法的迭代性质以及它对内存数据的依赖性,考虑到一些内存和性能问题,例如MAP任务的粒度,改组操作的减少,应用了非标准的MapReduce范例。 ,缓存,部分数据重新计算以及集群的使用。这些实现有益于整个Hadoop生态系统组件,例如HDFS,Yarn和流。测试是在云执行环境中使用多达128个内核的不同配置进行的。在三个已知的数据集上评估了新实现的性能,并将结果与​​以前的U-BRAIN并行实现的结果进行了比较。结果表明,在集群环境中,加速高达20倍,并具有良好的可伸缩性和可靠性。

更新日期:2020-04-22
down
wechat
bug