A parallel fuzzy rule-base based decision tree in the framework of Map-Reduce,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A parallel fuzzy rule-base based decision tree in the framework of Map-Reduce
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.patcog.2020.107326
Yashuang Mu , Xiaodong Liu , Lidong Wang , Juxiang Zhou

Abstract Decision trees are commonly used for learning and extracting classification rules from data. The fuzzy rule based decision tree (FRDT) is very representative owing to its better robustness and generalization. However, FRDT cannot work well on the analysis of large-scale data sets. One solution for this problem is parallel computing. A proved effective parallel computing model is Map-Reduce. Ensemble learning is an effective strategy which can significantly improve the generalization ability of machine learning systems. The objective of this paper is to develop a fuzzy rule-base based decision tree on the strategies of parallel computing and ensemble learning. First, we implement a parallel fusing fuzzy rule based classification system via Map-Reduce (MR-FFRCS) to display how to extract fuzzy rules from data in parallel and how to evaluate the fuzzy rules in an ensemble learning way. Then, taking MR-FFRCS as a fundamental module, we propose a parallel fuzzy rule-base based decision tree (MR-FRBDT) to improve the original FRDT algorithm. The experimental studies mainly focus on feasibility and parallelism. Compared with FRDT on 23 UCI benchmark data sets, the proposed MR-FRBDT algorithm with fewer parameters is effective and has the ability to handle large-scale data sets. Furthermore, some numerical experiments conducted on several large-scale data sets verify the parallel performance on reducing computing time and avoiding memory restrictions.

中文翻译：

Map-Reduce框架下基于并行模糊规则库的决策树

摘要决策树通常用于从数据中学习和提取分类规则。基于模糊规则的决策树（FRDT）由于其更好的鲁棒性和泛化性而非常具有代表性。但是，FRDT 在分析大规模数据集时不能很好地工作。此问题的一种解决方案是并行计算。一个被证明有效的并行计算模型是 Map-Reduce。集成学习是一种有效的策略，可以显着提高机器学习系统的泛化能力。本文的目的是开发一种基于模糊规则库的并行计算和集成学习策略的决策树。第一的，我们通过 Map-Reduce (MR-FFRCS) 实现了一个基于并行融合模糊规则的分类系统，以展示如何从数据中并行提取模糊规则以及如何以集成学习的方式评估模糊规则。然后，以MR-FFRCS为基础模块，我们提出了一种基于并行模糊规则库的决策树（MR-FRBDT）来改进原有的FRDT算法。实验研究主要集中在可行性和并行性上。与 FRDT 在 23 个 UCI 基准数据集上相比，所提出的 MR-FRBDT 算法参数较少，具有处理大规模数据集的能力。此外，在多个大规模数据集上进行的一些数值实验验证了并行性能在减少计算时间和避免内存限制方面的性能。以MR-FFRCS为基础模块，我们提出了一种基于并行模糊规则库的决策树（MR-FRBDT）来改进原有的FRDT算法。实验研究主要集中在可行性和并行性上。与 FRDT 在 23 个 UCI 基准数据集上相比，所提出的 MR-FRBDT 算法参数较少，具有处理大规模数据集的能力。此外，在多个大规模数据集上进行的一些数值实验验证了并行性能在减少计算时间和避免内存限制方面的性能。以MR-FFRCS为基础模块，我们提出了一种基于并行模糊规则库的决策树（MR-FRBDT）来改进原有的FRDT算法。实验研究主要集中在可行性和并行性上。与 FRDT 在 23 个 UCI 基准数据集上相比，所提出的 MR-FRBDT 算法参数较少，具有处理大规模数据集的能力。此外，在多个大规模数据集上进行的一些数值实验验证了并行性能在减少计算时间和避免内存限制方面的性能。提出的参数较少的MR-FRBDT算法是有效的，具有处理大规模数据集的能力。此外，在多个大规模数据集上进行的一些数值实验验证了并行性能在减少计算时间和避免内存限制方面的性能。提出的参数较少的MR-FRBDT算法是有效的，具有处理大规模数据集的能力。此外，在多个大规模数据集上进行的一些数值实验验证了并行性能在减少计算时间和避免内存限制方面的性能。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11