AutoML to Date and Beyond: Challenges and Opportunities,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AutoML to Date and Beyond: Challenges and Opportunities
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2021-10-05 , DOI: 10.1145/3470918
Shubhra Kanti Karmaker (“Santu”) ₁ , Md. Mahadi Hassan ₁ , Micah J. Smith ₂ , Lei Xu ₂ , Chengxiang Zhai ₃ , Kalyan Veeramachaneni ₂

Affiliation

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

中文翻译：

AutoML 迄今为止及以后：挑战与机遇

随着大数据在各个领域变得无处不在，越来越多的利益相关者渴望充分利用他们的数据，对机器学习工具的需求促使研究人员探索自动化机器学习 (AutoML) 的可能性。AutoML 工具旨在让非机器学习专家（领域专家）能够访问机器学习，提高机器学习的效率，加速机器学习研究。但是，尽管自动化和效率是 AutoML 的主要卖点之一，但该过程仍然需要人工参与许多关键步骤，包括了解特定领域数据的属性、定义预测问题、创建合适的训练数据集以及选择有前途的机器学习技术。这些步骤通常需要长时间的反复来回，这使得该过程对于领域专家和数据科学家来说效率低下，并使所谓的 AutoML 系统无法真正实现自动化。在这篇评论文章中，我们介绍了一种新的 AutoML 系统分类系统，使用七层示意图根据它们的自治程度来区分这些系统。我们首先描述端到端机器学习管道的实际外观，以及到目前为止机器学习管道的哪些子任务已经自动化。我们强调了那些仍然手动完成的子任务——通常由数据科学家完成——并解释这如何限制了领域专家对机器学习的访问。接下来，我们为 AutoML 系统介绍我们新颖的基于级别的分类法，并根据所提供的自动化支持范围定义每个级别。最后，

更新日期：2021-10-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>