当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SQLFlow: A Bridge between SQL and Machine Learning
arXiv - CS - Databases Pub Date : 2020-01-19 , DOI: arxiv-2001.06846
Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, Yuan Tang

Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

中文翻译:

SQLFlow:SQL 和机器学习之间的桥梁

工业 AI 系统主要是端到端机器学习 (ML) 工作流程。一个典型的推荐或商业智能系统包括许多在线微服务和离线作业。我们描述了在 SQL 中高效开发此类工作流的 SQLFlow。SQL 使开发人员能够编写专注于目的(什么)而忽略过程(如何)的简短程序。以前的数据库系统扩展了它们的 SQL 方言以支持 ML。SQLFlow (https://sqlflow.org/sqlflow ) 采用另一种策略作为各种数据库系统的桥梁,包括 MySQL、Apache Hive 和阿里巴巴 MaxCompute,以及 TensorFlow、XGBoost 和 scikit-learn 等机器学习引擎。我们仔细扩展了 SQL 语法,使扩展能够与各种 SQL 方言配合使用。我们通过发明协作解析算法来实现扩展。SQLFlow 对多种 ML 技术(有监督和无监督学习)高效且具有表现力;深度网络和树模型;除了训练和预测之外,还有视觉模型解释;除了机器学习之外,还有数据处理和特征提取。SQLFlow 将 SQL 程序编译成 Kubernetes 原生工作流,用于容错执行和云端部署。目前的行业用户包括蚂蚁金服、滴滴和阿里巴巴集团。
更新日期:2020-01-22
down
wechat
bug