SOOM: Sort-Based Optimizer for Big Data Multi-Query.,Big Data

当前位置： X-MOL 学术 › Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SOOM: Sort-Based Optimizer for Big Data Multi-Query.
Big Data ( IF 2.6 ) Pub Date : 2020-02-01 , DOI: 10.1089/big.2019.0023
Radhya Sahal _{1,

2} , Mohammed H Khafagy ₃ , Fatma A Omara ₁

Affiliation

Mostly, sorting of data is a common operation in many applications, which causes the consumption of resources and thus leads to computation overheads. Regarding the context of Big Data multi-query, the shared sort operations are fairly large, which incur high-cost I/Os whether explicit or implicit. In particular, Big Data multi-query, including aggregation and sort operations, takes long execution time due to reshuffle of the same data multiple times using similar tasks. Therefore, exploiting the sharing data and the sharing sort opportunities of similar tasks can offer the possibility of reusing the previous results to optimize multi-query. For considering sharing data, our previous work, Multi-Query Optimization Using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of the sharing data opportunities among multi-query. However, time overheads regarding redundant data in-network movement (i.e., shuffling time to transfer intermediate data for sort operations) have not been considered. Therefore, the MOTH system has been extended to SOOM (Sort-Based Optimizer over MOTH) system to exploit sharing sort opportunities, including explicit sorts of sort queries and implicit sorts of aggregation queries. The proposed SOOM system consists of two additional modules to exploit sharing sort opportunities, namely query explorer and sort exploiter, which leverage our existing MOTH system to fulfill optimizing multiple aggregation and sort queries. The experimental evaluation has shown that the SOOM system outperforms the naive and the state-of-art techniques regarding query execution time among queries by 45% and 30%, respectively, while introducing maximal intermediate data size reduction by 67% and 61% in average, respectively, over Hadoop-like infrastructures.

中文翻译：

SOOM：大数据多查询的基于排序的优化器。

通常，数据排序是许多应用程序中的常见操作，这会导致资源消耗，从而导致计算开销。关于大数据多查询的上下文，共享排序操作相当大，无论是显式的还是隐式的，都会导致高成本的I / O。特别是，大数据多查询（包括聚合和排序操作）由于使用相似任务多次相同数据的改组而需要较长的执行时间。因此，利用相似任务的共享数据和共享排序机会可以提供重用以前的结果来优化多查询的可能性。为了考虑共享数据，我们以前的工作，使用元组大小和直方图（MOTH）系统进行多查询优化，已经介绍了考虑多查询之间共享数据机会的粒度。但是，尚未考虑有关冗余数据在网络中移动的时间开销（即，为进行排序操作而传输中间数据的改组时间）。因此，MOTH系统已扩展到SOOM（基于排序的基于MOTH的优化器）系统，以利用共享排序机会，包括显式排序的排序查询和隐式排序的聚合查询。拟议的SOOM系统由两个额外的模块组成，以利用共享的排序机会，即查询浏览器和排序浏览器，它们利用我们现有的MOTH系统来实现优化多个聚合和排序查询。

更新日期：2020-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11