Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey,Journal of Computer Science and Technology

当前位置： X-MOL 学术 › J. Comput. Sci. Tech. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2020-07-01 , DOI: 10.1007/s11390-020-9487-4
Punit Kumar , Atul Gupta

Generally, data is available abundantly in unlabeled form, and its annotation requires some cost. The labeling, as well as learning cost, can be minimized by learning with the minimum labeled data instances. Active learning (AL), learns from a few labeled data instances with the additional facility of querying the labels of instances from an expert annotator or oracle. The active learner uses an instance selection strategy for selecting those critical query instances, which reduce the generalization error as fast as possible. This process results in a refined training dataset, which helps in minimizing the overall cost. The key to the success of AL is query strategies that select the candidate query instances and help the learner in learning a valid hypothesis. This survey reviews AL query strategies for classification, regression, and clustering under the pool-based AL scenario. The query strategies under classification are further divided into: informative-based, representative-based, informative- and representative-based, and others. Also, more advanced query strategies based on reinforcement learning and deep learning, along with query strategies under the realistic environment setting, are presented. After a rigorous mathematical analysis of AL strategies, this work presents a comparative analysis of these strategies. Finally, implementation guide, applications, and challenges of AL are discussed.

中文翻译：

分类、回归和聚类的主动学习查询策略：调查

通常，数据以未标记的形式大量可用，其注释需要一些成本。通过使用最少的标记数据实例进行学习，可以最小化标记以及学习成本。主动学习 (AL) 从几个标记的数据实例中学习，并具有从专家注释器或预言机查询实例标签的附加功能。主动学习器使用实例选择策略来选择那些关键查询实例，以尽可能快地减少泛化误差。这个过程会产生一个精炼的训练数据集，这有助于最大限度地降低总体成本。AL 成功的关键是选择候选查询实例并帮助学习者学习有效假设的查询策略。该调查回顾了用于分类、回归、和基于池的 AL 场景下的聚类。分类下的查询策略进一步分为：基于信息的、基于代表的、基于信息和代表的等。此外，还介绍了基于强化学习和深度学习的更高级的查询策略，以及现实环境设置下的查询策略。在对 AL 策略进行严格的数学分析之后，这项工作对这些策略进行了比较分析。最后，讨论了 AL 的实施指南、应用和挑战。提出了基于强化学习和深度学习的更高级的查询策略，以及现实环境设置下的查询策略。在对 AL 策略进行严格的数学分析之后，这项工作对这些策略进行了比较分析。最后，讨论了 AL 的实施指南、应用和挑战。提出了基于强化学习和深度学习的更高级的查询策略，以及现实环境设置下的查询策略。在对 AL 策略进行严格的数学分析之后，这项工作对这些策略进行了比较分析。最后，讨论了 AL 的实施指南、应用和挑战。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>