An overview of recent distributed algorithms for learning fuzzy models in Big Data classification,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification
Journal of Big Data ( IF 8.1 ) Pub Date : 2020-03-10 , DOI: 10.1186/s40537-020-00298-6
Pietro Ducange , Michela Fazzolari , Francesco Marcelloni

Nowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability.

中文翻译：

大数据分类中用于学习模糊模型的最新分布式算法概述

如今，许多不同的异构源（例如社交网络和媒体，移动设备，互联网交易，联网设备和传感器）通常在很短的时间间隔内以各种格式生成大量数据。这些数据在文献中被称为大数据，其特征是流行的V的功能，例如价值，准确性，多样性，速度和音量。价值尤其关注于可以从数据中挖掘的有用知识。因此，在最近几年中，已经提出了许多数据挖掘和机器学习算法来从大数据中提取知识。这些算法通常通过在特定的分布式计算框架（例如Apache Hadoop和Apache Spark）上使用临时编程范例（例如MapReduce）来实现。在大数据的背景下，由于模糊模型具有处理模糊和不精确数据的能力以及其固有的可解释性，因此它们目前起着重要的作用。在这项工作中，我们概述了用于生成大数据模糊分类模型的最新分布式学习算法。尤其是，我们首先展示这些学习算法的一些设计和实现细节。此后，我们在准确性和可解释性方面对它们进行了比较。最后，我们讨论它们的可伸缩性。

更新日期：2020-04-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>