Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2021-01-05 , DOI: 10.1007/s10619-020-07319-6
S. Tamil Selvan , P. Balamurugan , M. Vijayakumar

With large volumes of data being generated in recent years and the inception of big data analytics on social media necessitates accurate user query processing with minimum time complexity. Several research works have been conducted in this area, to address accuracy and time complexity involved in query processing, in this work, Wald Adaptive Prefetched Boosting Classification based Czekanowski Similarity MapReduce (WAPBC–CSMR) technique is introduced. The WAPBC–CSMR technique uses the big dataset for processing large number of user queries. First, a technique called, Wald Adaptive Prefetched Boosting is employed with the objective of classifying the big dataset into different classes. To reduce the time involved in classification, in this paper a classifier called Gaussian distributive Rocchio is used that achieves significant classification in minimum time. With the classified results, a Likelihood Radio Test is applied to integrate the weak learner results into strong classification results. Then the classified and refined data are stored on the prefetcher cache. Upon reception of multi-dimensional user queries by the prefetch manager, the queries are now split into multiple keywords and are fed into the map phase, where mapping function is performed using Czekanowski Similarity Index with the objective of identifying the repeated jobs with maximum query processing accuracy. Followed by which the relevant data are retrieved from the prefetcher cache and repeated user query task is removed in the reduce phase via statistical function, therefore contributing to minimum time. Result analysis of WAPBC–CSMR is performed with big dataset using different metrics such as query processing accuracy, error rate and processing time for varied number of user queries. The result shows that WAPBC–CSMR technique enhances query processing accuracy and lessens the time as well as the error rate than the conventional methods.

中文翻译：

基于 Czekanowski 相似度 MapReduce 的预取 wald 自适应提升分类，用于使用大数据进行用户查询处理

近年来，随着大量数据的产生以及社交媒体上大数据分析的出现，需要以最小的时间复杂度进行准确的用户查询处理。在该领域已经进行了多项研究工作，为了解决查询处理中涉及的准确性和时间复杂性，在这项工作中，引入了基于 Wald 自适应预取提升分类的 Czekanowski 相似性 MapReduce (WAPBC-CSMR) 技术。WAPBC-CSMR 技术使用大数据集来处理大量用户查询。首先，采用一种称为 Wald Adaptive Prefetched Boosting 的技术，目的是将大数据集分类为不同的类别。为了减少分类所涉及的时间，在本文中，使用了一种称为高斯分布 Rocchio 的分类器，它在最短的时间内实现了显着的分类。根据分类结果，应用似然无线电测试将弱学习器结果整合为强分类结果。然后将分类和细化的数据存储在预取器缓存中。在预取管理器接收到多维用户查询后，查询现在被拆分为多个关键字并被送入映射阶段，其中使用 Czekanowski 相似性索引执行映射功能，目的是识别具有最大查询处理量的重复作业准确性。随后从预取缓存中检索相关数据，并在reduce阶段通过统计函数去除重复的用户查询任务，因此有助于最短时间。WAPBC-CSMR 的结果分析是使用大数据集进行的，使用不同的指标，如查询处理准确率、错误率和处理时间，用于不同数量的用户查询。结果表明，与传统方法相比，WAPBC-CSMR 技术提高了查询处理的准确性，减少了时间和错误率。

更新日期：2021-01-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11