当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lachesis: Automated Generation of Persistent Partitionings for UDF-Centric Analytics
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-06-30 , DOI: arxiv-2006.16529
Jia Zou, Pratik Barhate, Amitabh Das, Arun Iyengar, Binhang Yuan, Dimitrije Jankov, Chris Jermaine

Persistent partitioning is effective in avoiding expensive shuffling operations. However it remains a significant challenge to automate this process for Big Data analytics workloads that extensively use user defined functions (UDFs), where sub-computations are hard to be reused for partitionings compared to relational applications. In addition, functional dependency that is widely utilized for partitioning selection is often unavailable in the unstructured data that is ubiquitous in UDF-centric analytics. We propose the Lachesis system, which represents UDF-centric workloads as workflows of analyzable and reusable sub-computations. Lachesis further adopts a deep reinforcement learning model to infer which sub-computations should be used to partition the underlying data. This analysis is then applied to automatically optimize the storage of the data across applications to improve the performance and users' productivity.

中文翻译:

Lachesis:为以 UDF 为中心的分析自动生成持久分区

持久分区可有效避免昂贵的改组操作。然而,对于广泛使用用户定义函数 (UDF) 的大数据分析工作负载,自动化此过程仍然是一个重大挑战,与关系应用程序相比,子计算难以重用于分区。此外,广泛用于分区选择的函数依赖在以 UDF 为中心的分析中无处不在的非结构化数据中通常不可用。我们提出了 Lachesis 系统,它将以 UDF 为中心的工作负载表示为可分析和可重用子计算的工作流。Lachesis 进一步采用深度强化学习模型来推断应该使用哪些子计算来划分底层数据。
更新日期:2020-10-13
down
wechat
bug