An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2020-01-29 , DOI: 10.1007/s10619-020-07285-z
Deepak Kumar , Vijay Kumar Jha

Storing as well as retrieving the data on a specific time frame is fundamental for any application today. So an efficiently designed query permits the user to get results in the desired time and creates credibility for the corresponding application. To avoid the difficulty in query optimization, this paper proposed an improved query optimization process in big data (BD) using the ACO-GA algorithm and HDFS map-reduce. The proposed methodology consists of ‘2’ phases, namely, BD arrangement and query optimization phases. In the first phase, the input data is pre-processed by finding the hash value (HV) using the SHA-512 algorithm and the removal of repeated data using the HDFS map-reduce function. Then, features such as closed frequent pattern, support, and confidence are extracted. Next, the support and confidence are managed by using the entropy calculation. Centered on the entropy calculation, the related information is grouped by using Normalized K-Means (NKM) algorithm. In the 2nd phase, the BD queries are collected, and then the same features are extorted. Next, the optimized query is found by utilizing the ACO-GA algorithm. Finally, the similarity assessment process is performed. The experimental outcomes illustrate that the algorithm outperformed other existent algorithms.

中文翻译：

使用ACO-GA算法和HDFS map reduce技术改进大数据查询优化过程

在特定时间范围内存储和检索数据是当今任何应用程序的基础。因此，高效设计的查询允许用户在所需的时间内获得结果，并为相应的应用程序创建可信度。为了避免查询优化中的困难，本文提出了一种改进的大数据（BD）查询优化过程，使用ACO-GA算法和HDFS map-reduce。所提出的方法包括“2”阶段，即BD 安排和查询优化阶段。在第一阶段，通过使用 SHA-512 算法查找哈希值 (HV) 并使用 HDFS map-reduce 函数去除重复数据来对输入数据进行预处理。然后，提取闭合频繁模式、支持度和置信度等特征。下一个，通过使用熵计算来管理支持度和置信度。以熵计算为中心，使用Normalized K-Means (NKM)算法对相关信息进行分组。在第二阶段，收集 BD 查询，然后勒索相同的特征。接下来，利用 ACO-GA 算法找到优化的查询。最后，执行相似性评估过程。实验结果表明该算法优于其他现有算法。

更新日期：2020-01-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11