A Unified Transferable Model for ML-Enhanced DBMS,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Unified Transferable Model for ML-Enhanced DBMS
arXiv - CS - Databases Pub Date : 2021-05-06 , DOI: arxiv-2105.02418
Ziniu Wu, Peilun Yang, Pei Yu, Rong Zhu, Yuxing Han, Yaliang Li, Defu Lian, Kai Zeng, Jingren Zhou

Recently, the database management system (DBMS) community has witnessed the power of machine learning (ML) solutions for DBMS tasks. Despite their promising performance, these existing solutions can hardly be considered satisfactory. First, these ML-based methods in DBMS are not effective enough because they are optimized on each specific task, and cannot explore or understand the intrinsic connections between tasks. Second, the training process has serious limitations that hinder their practicality, because they need to retrain the entire model from scratch for a new DB. Moreover, for each retraining, they require an excessive amount of training data, which is very expensive to acquire and unavailable for a new DB. We propose to explore the transferabilities of the ML methods both across tasks and across DBs to tackle these fundamental drawbacks. In this paper, we propose a unified model MTMLF that uses a multi-task training procedure to capture the transferable knowledge across tasks and a pretrain finetune procedure to distill the transferable meta knowledge across DBs. We believe this paradigm is more suitable for cloud DB service, and has the potential to revolutionize the way how ML is used in DBMS. Furthermore, to demonstrate the predicting power and viability of MTMLF, we provide a concrete and very promising case study on query optimization tasks. Last but not least, we discuss several concrete research opportunities along this line of work.

中文翻译：

ML增强型DBMS的统一可转移模型

最近，数据库管理系统（DBMS）社区见证了针对DBMS任务的机器学习（ML）解决方案的强大功能。尽管它们的性能令人鼓舞，但这些现有解决方案几乎不能被认为是令人满意的。首先，DBMS中这些基于ML的方法不够有效，因为它们针对每个特定任务进行了优化，并且无法探究或理解任务之间的内在联系。其次，训练过程有严重的局限性，阻碍了它们的实用性，因为他们需要从头开始为新的数据库重新训练整个模型。此外，对于每次重新培训，它们都需要大量的培训数据，这对于获取非常昂贵，并且对于新的数据库而言是不可用的。我们建议探索ML方法跨任务和跨DB的可移植性，以解决这些基本缺陷。在本文中，我们提出了一个统一的模型MTMLF，该模型使用多任务训练过程来捕获跨任务的可转移知识，并使用预训练微调过程来提取跨数据库的可转移元知识。我们相信这种范例更适合云数据库服务，并且有可能改变ML在DBMS中的使用方式。此外，为了证明MTMLF的预测能力和可行性，我们提供了关于查询优化任务的具体且非常有前途的案例研究。最后但并非最不重要的一点是，我们讨论了这一工作中的一些具体研究机会。我们相信这种范例更适合云数据库服务，并且有可能改变ML在DBMS中的使用方式。此外，为了证明MTMLF的预测能力和可行性，我们提供了关于查询优化任务的具体且非常有前途的案例研究。最后但并非最不重要的一点是，我们讨论了这一工作中的一些具体研究机会。我们相信这种范例更适合云数据库服务，并且有可能改变ML在DBMS中的使用方式。此外，为了证明MTMLF的预测能力和可行性，我们提供了关于查询优化任务的具体且非常有前途的案例研究。最后但并非最不重要的一点是，我们讨论了这一工作中的一些具体研究机会。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文