当前位置: X-MOL 学术PeerJ Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method
PeerJ Computer Science ( IF 3.5 ) Pub Date : 2021-06-01 , DOI: 10.7717/peerj-cs.580
Elham Azhir, Nima Jafari Navimipour, Mehdi Hosseinzadeh, Arash Sharifi, Aso Darwesh

Query optimization is the process of identifying the best Query Execution Plan (QEP). The query optimizer produces a close to optimal QEP for the given queries based on the minimum resource usage. The problem is that for a given query, there are plenty of different equivalent execution plans, each with a corresponding execution cost. To produce an effective query plan thus requires examining a large number of alternative plans. Access plan recommendation is an alternative technique to database query optimization, which reuses the previously-generated QEPs to execute new queries. In this technique, the query optimizer uses clustering methods to identify groups of similar queries. However, clustering such large datasets is challenging for traditional clustering algorithms due to huge processing time. Numerous cloud-based platforms have been introduced that offer low-cost solutions for the processing of distributed queries such as Hadoop, Hive, Pig, etc. This paper has applied and tested a model for clustering variant sizes of large query datasets parallelly using MapReduce. The results demonstrate the effectiveness of the parallel implementation of query workloads clustering to achieve good scalability.

中文翻译:

一种使用 MapReduce 框架和基于语义的聚类方法的并行查询优化技术

查询优化是确定最佳查询执行计划 (QEP) 的过程。查询优化器基于最小资源使用为给定查询生成接近最佳的 QEP。问题是对于给定的查询,有很多不同的等效执行计划,每个执行计划都有相应的执行成本。因此,要生成有效的查询计划,需要检查大量替代计划。访问计划推荐是数据库查询优化的一种替代技术,它重用先前生成的 QEP 来执行新查询。在这种技术中,查询优化器使用聚类方法来识别相似查询的组。然而,由于巨大的处理时间,对如此大的数据集进行聚类对于传统的聚类算法来说具有挑战性。已经引入了许多基于云的平台,这些平台为处理分布式查询(例如 Hadoop、Hive、Pig 等)提供了低成本的解决方案。本文应用并测试了一种使用 MapReduce 并行聚类大型查询数据集变体大小的模型。结果证明了并行实现查询工作负载集群的有效性,以实现良好的可扩展性。
更新日期:2021-06-01
down
wechat
bug