Chicken swarm foraging algorithm for big data classification using the deep belief network classifier,Data Technologies and Applications

当前位置： X-MOL 学术 › Data Technol. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Chicken swarm foraging algorithm for big data classification using the deep belief network classifier
Data Technologies and Applications ( IF 1.7 ) Pub Date : 2020-07-28 , DOI: 10.1108/dta-08-2019-0146
Sathyaraj R , Ramanathan L , Lavanya K , Balasubramanian V , Saira Banu J

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

中文翻译：

基于深度信念网络分类器的大数据分类鸡群觅食算法

目的

大数据领域的创新日新月异，以至于传统的软件工具在管理大数据时面临着一些问题。而且，海量数据集中不平衡数据的出现是研究行业的一大制约因素。

设计/方法/方法

本文的目的是介绍一种使用基于优化算法的 MapReduce 框架的大数据分类技术。使用 MapReduce 框架启用大数据分类，该框架利用所提出的优化算法，称为基于鸡的细菌觅食 (CBF) 算法。所提出的算法是通过将细菌觅食优化（BFO）算法与猫群优化（CSO）算法相结合而生成的。所提出的模型在两个阶段执行该过程，即训练和测试阶段。在训练阶段，来自不同分布式源的大数据使用映射器阶段的映射器进行并行处理，映射器基于提出的 CBF 算法进行预处理和特征选择。预处理步骤消除了冗余和不一致的数据，而特征部分步骤是对预处理数据进行的，用于从数据中提取重要特征，以提高分类精度。使用深度置信网络 (DBN) 分类器将选定的特征输入到减速器中进行数据分类，该分类器使用所提出的 CBF 算法进行训练，以便将数据分类为各种类别，最后，在训练过程结束时，各个减速器呈现经过训练的模型。因此，在训练阶段基于训练模型有效处理增量数据。在测试阶段，获取增量数据并将其拆分为不同的子集，然后输入不同的映射器进行分类。每个映射器都包含一个从训练阶段获得的训练模型。训练好的模型用于对增量数据进行分类。分类后，从每个映射器获得的输出被融合并输入到归约器中进行分类。

发现

使用癫痫发作识别数据库获得最大准确度和Jaccard系数。所提出的 CBF-DBN 产生的最大准确度值为 91.129%，而现有神经网络 (NN)、DBN、朴素贝叶斯分类器-术语频率-逆文档频率 (NBC-TFIDF) 的准确度值为 82.894%、86.184%和 86.512%，分别。提出的 CBF-DBN 的 Jaccard 系数产生的最大 Jaccard 系数值为 88.928%，而现有 NN、DBN、NBC-TFIDF 的 Jaccard 系数值分别为 75.891%、79.850% 和 81.103%。

原创性/价值

本文提出了一种大数据分类方法，用于对海量数据集进行分类，以满足海量数据的约束。大数据分类在 MapReduce 框架上基于训练和测试阶段进行，数据同时并行处理。在训练阶段，获取大数据并将其划分为不同的数据子集并馈入映射器。在映射器中，执行特征提取步骤以提取重要特征。获得的特征经过归约器使用获得的特征对数据进行分类。DBN 分类器用于分类，其中使用建议的 CBF 算法训练 DBN。训练后的模型作为分类后的输出获得。在测试阶段，分类考虑增量数据。新数据首先被分成子集并输入映射器进行分类。从训练阶段获得的训练模型用于分类。来自每个映射器的分类结果被融合并输入到减速器中，用于大数据的分类。

更新日期：2020-07-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文