Two-stage optimization for machine learning workflow,Information Systems

当前位置： X-MOL 学术 › Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-stage optimization for machine learning workflow
Information Systems ( IF 3.0 ) Pub Date : 2019-12-09 , DOI: 10.1016/j.is.2019.101483
Alexandre Quemy

Machine learning techniques play a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners.

For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as AutoML.

In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.

中文翻译：

机器学习工作流程的两阶段优化

机器学习技术在处理海量数据方面起着主要作用，并在几乎所有可能的领域中得到应用。无论是主题专家还是机器学习从业者，构建要在生产中部署的高质量机器学习模型都是一项艰巨的任务。

为了使机器学习系统得到更广泛的采用和扩展，机器学习工作流程的构建和配置需要实现自动化。在最近几年中，已经朝着这个方向开发了几种技术，称为AutoML。

在本文中，我们提出了一个两阶段的优化过程，以建立数据管道和配置机器学习算法。首先，我们研究了数据流水线与算法配置相比的影响，以显示数据预处理对超参数调整的重要性。第二部分介绍了在数据管道构建和算法配置之间有效分配搜索时间的策略。这些策略与metaoptimizer无关。最后，我们提出一种度量来确定数据管道是特定的还是独立于算法的，从而为冷启动问题启用了细粒度的管道修剪和元学习。

更新日期：2019-12-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11