On Initial Pools for Deep Active Learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Initial Pools for Deep Active Learning
arXiv - CS - Machine Learning Pub Date : 2020-11-30 , DOI: arxiv-2011.14696
Akshay L Chandra, Sai Vikas Desai, Chaitanya Devaguptapu, Vineeth N Balasubramanian

Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent` studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool. Given the recent successes of learning representations in self-supervised/unsupervised ways, we propose to study if an intelligently sampled initial labeled pool can improve deep AL performance. We will investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. We describe our experimental details, implementation details, datasets, performance metrics as well as planned ablation studies in this proposal. If intelligently sampled initial pools improve AL performance, our work could make a positive contribution to boosting AL performance with no additional annotation, developing datasets with lesser annotation cost in general, and promoting further research in the use of unsupervised learning methods for AL.

中文翻译：

关于深度主动学习的初始池

主动学习（AL）技术旨在最小化为给定任务训练模型所需的训练数据。基于池的AL技术首先从一个小的初始标记池开始，然后迭代地挑选一批最有用的样本进行标记。通常，初始池是随机采样的，并标记为AL迭代的种子。虽然最近的研究集中在评估AL中各种查询功能的鲁棒性，但很少或根本没有关注初始标记池的设计。鉴于最近以自我监督/非监督方式学习表示形式的成功经验，我们建议研究智能采样的初始标记池是否可以改善深度AL的表现。我们将调查以智能方式采样的初始标记池的效果，包括使用自我监督和非监督策略，深入的AL方法。我们在此提案中描述了我们的实验细节，实施细节，数据集，性能指标以及计划的消融研究。如果以智能方式采样的初始池可以改善AL性能，那么我们的工作将为不增加附加注释的AL性能提高，总体上以较低注释成本开发数据集做出积极贡献，并促进在无监督学习方法的使用中进行进一步的研究。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文