A Distributed Fuzzy Associative Classifier for Big Data,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Distributed Fuzzy Associative Classifier for Big Data
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2018-09-01 , DOI: 10.1109/tcyb.2017.2748225
Armando Segatori , Alessio Bechini , Pietro Ducange , Francesco Marcelloni

Fuzzy associative classification has not been widely analyzed in the literature, although associative classifiers (ACs) have proved to be very effective in different real domain applications. The main reason is that learning fuzzy ACs is a very heavy task, especially when dealing with large datasets. To overcome this drawback, in this paper, we propose an efficient distributed fuzzy associative classification approach based on the MapReduce paradigm. The approach exploits a novel distributed discretizer based on fuzzy entropy for efficiently generating fuzzy partitions of the attributes. Then, a set of candidate fuzzy association rules is generated by employing a distributed fuzzy extension of the well-known FP-Growth algorithm. Finally, this set is pruned by using three purposely adapted types of pruning. We implemented our approach on the popular Hadoop framework. Hadoop allows distributing storage and processing of very large data sets on computer clusters built from commodity hardware. We have performed an extensive experimentation and a detailed analysis of the results using six very large datasets with up to 11 000 000 instances. We have also experimented different types of reasoning methods. Focusing on accuracy, model complexity, computation time, and scalability, we compare the results achieved by our approach with those obtained by two distributed nonfuzzy ACs recently proposed in the literature. We highlight that, although the accuracies result to be comparable, the complexity, evaluated in terms of number of rules, of the classifiers generated by the fuzzy distributed approach is lower than the one of the nonfuzzy classifiers.

中文翻译：

大数据的分布式模糊关联分类器

尽管关联分类器（AC）已被证明在不同的实际领域应用中非常有效，但文献中并未对模糊关联分类进行广泛的分析。主要原因是学习模糊AC是一项非常繁重的任务，尤其是在处理大型数据集时。为了克服这个缺点，本文提出了一种基于MapReduce范式的高效的分布式模糊关联分类方法。该方法利用基于模糊熵的新型分布式离散器来有效地生成属性的模糊分区。然后，通过采用众所周知的FP-Growth算法的分布式模糊扩展来生成一组候选模糊关联规则。最后，通过使用三种有针对性的修剪类型来修剪此集合。我们在流行的Hadoop框架上实现了我们的方法。Hadoop允许在由商用硬件构建的计算机集群上分配非常大的数据集的存储和处理。我们已经使用多达1100万个实例的六个非常大的数据集对结果进行了广泛的实验和详细的分析。我们还尝试了不同类型的推理方法。着眼于准确性，模型复杂性，计算时间和可伸缩性，我们将我们的方法所获得的结果与文献中最近提出的两个分布式无模糊AC所获得的结果进行了比较。我们着重指出，尽管准确性的结果是可比较的，但根据规则数量评估的，由模糊分布式方法生成的分类器的复杂度低于非模糊分类器之一。

更新日期：2018-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>