当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Survey of Deep Active Learning
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2021-10-08 , DOI: 10.1145/3472291
Pengzhen Ren 1 , Yun Xiao 1 , Xiaojun Chang 2 , Po-Yao Huang 3 , Zhihui Li 4 , Brij B. Gupta 5 , Xiaojiang Chen 1 , Xin Wang 1
Affiliation  

Active learning (AL) attempts to maximize a model’s performance gain while annotating the fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize a massive number of parameters if the model is to learn how to extract high-quality features. In recent years, due to the rapid development of internet technology, we have entered an era of information abundance characterized by massive amounts of available data. As a result, DL has attracted significant attention from researchers and has been rapidly developed. Compared with DL, however, researchers have a relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples, meaning that early AL is rarely according the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to a large number of publicly available annotated datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, making it unfeasible in fields that require high levels of expertise (such as speech recognition, information extraction, medical images, etc.). Therefore, AL is gradually coming to receive the attention it is due. It is therefore natural to investigate whether AL can be used to reduce the cost of sample annotation while retaining the powerful learning capabilities of DL. As a result of such investigations, deep active learning (DeepAL) has emerged. Although research on this topic is quite abundant, there has not yet been a comprehensive survey of DeepAL-related works; accordingly, this article aims to fill this gap. We provide a formal classification method for the existing work, along with a comprehensive and systematic overview. In addition, we also analyze and summarize the development of DeepAL from an application perspective. Finally, we discuss the confusion and problems associated with DeepAL and provide some possible development directions.

中文翻译:

深度主动学习调查

主动学习 (AL) 尝试最大化模型的性能增益,同时注释尽可能少的样本。深度学习(DL)对数据是贪婪的,如果模型要学习如何提取高质量的特征,需要大量的数据供应来优化海量的参数。近年来,由于互联网技术的飞速发展,我们进入了一个以海量可用数据为特征的信息丰富时代。因此,深度学习引起了研究人员的极大关注并得到了迅速发展。然而,与 DL 相比,研究人员对 AL 的兴趣相对较低。这主要是因为在 DL 兴起之前,传统机器学习需要的标记样本相对较少,这意味着早期的 AL 很少达到应有的价值。尽管 DL 在各个领域都取得了突破,但这种成功大部分归功于大量公开可用的注释数据集。然而,获取大量高质量的标注数据集消耗大量人力,使其在需要高水平专业知识的领域(如语音识别、信息提取、医学图像等)不可行。因此,AL逐渐受到应有的关注。因此,很自然地研究是否可以使用 AL 来降低样本注释的成本,同时保留 DL 强大的学习能力。作为此类调查的结果,出现了深度主动学习(DeepAL)。尽管这方面的研究相当丰富,但目前还没有对 DeepAL 相关工作进行全面调查;因此,本文旨在填补这一空白。我们为现有工作提供了正式的分类方法,以及全面和系统的概述。此外,我们还从应用的角度分析总结了 DeepAL 的发展。最后,我们讨论了与 DeepAL 相关的困惑和问题,并提供了一些可能的发展方向。
更新日期:2021-10-08
down
wechat
bug