Big data classification using deep learning and apache spark architecture,Neural Computing and Applications

当前位置： X-MOL 学术 › Neural Comput. & Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Big data classification using deep learning and apache spark architecture
Neural Computing and Applications ( IF 4.5 ) Pub Date : 2021-07-07 , DOI: 10.1007/s00521-021-06145-w
Anilkumar V. Brahmane ₁ , B. Chaitanya Krishna ₁

Affiliation

The oddity in large information is rising step by step so that the current programming instruments faces trouble in supervision of huge information. Moreover, the pace of the irregularity information in the immense datasets is a key imperative to the exploration business. Along these lines, this paper proposes a novel method for taking care of the large information utilizing Spark structure. The proposed method experiences two stages for arranging the enormous information, which includes highlight choice and arrangement, which is acted in the underlying hubs of Spark engineering. The proposed improvement calculation is named Rider Chaotic Biography streamlining (RCBO) calculation, which is the incorporation of the Rider Optimization Algorithm (ROA) and the standard confused biogeography-based-advancement (CBBO). The proposed RCBO-profound stacked auto-encoder utilizing Spark structure successfully handles the large information for achieving powerful huge information arrangement. Here, the proposed RCBO is utilized for choosing reasonable highlights from the monstrous dataset. Besides, the profound stacked auto-encoder utilizes RCBO for preparing so as to characterize colossal datasets. In this research we focused on problem of supervision related to big information of The Cover type Data in UCI machine learning repository. The dataset describes the forest cover set data to predict the forest cover type from cartographic variables. The dataset is multivariate in nature with number of web hits 263,361. The number of instances is 581012 with 54 numbers of attributes and the task associated for the dataset is classification. The examination of the proposed RCBO-profound stacked auto-encoder-based Spark structure utilizing the UCI AI datasets uncovered that the proposed technique beat different strategies, by procuring maximal exactness of 86.71%, dice coefficient of 92.7%, affectability of 75.2% and explicitness of 95.4% separately.

中文翻译：

使用深度学习和apache spark架构的大数据分类

海量信息的怪异性逐步上升，使得当前的编程工具在海量信息的监管上面临困难。此外，庞大数据集中不规则信息的速度对勘探业务至关重要。沿着这些思路，本文提出了一种利用 Spark 结构处理大信息的新方法。所提出的方法经历了两个阶段来安排海量信息，包括突出选择和安排，这是在 Spark 工程的底层枢纽中起作用的。所提出的改进计算被命名为 Rider 混沌传记精简 (RCBO) 计算，它结合了 Rider 优化算法 (ROA) 和标准的基于混淆生物地理学的改进 (CBBO)。提出的 RCBO 深度堆叠自动编码器利用 Spark 结构成功处理大信息，实现强大的海量信息排列。在这里，建议的 RCBO 用于从巨大的数据集中选择合理的亮点。此外，深度堆叠自动编码器利用 RCBO 进行准备，以表征庞大的数据集。在本研究中，我们关注与 UCI 机器学习存储库中的封面类型数据的大信息相关的监督问题。该数据集描述了森林覆盖集数据，以根据制图变量预测森林覆盖类型。该数据集本质上是多元的，网络点击次数为 263,361。实例数为 581012，具有 54 个属性，与数据集关联的任务是分类。

更新日期：2021-07-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文