当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using error decay prediction to overcome practical issues of deep active learning for named entity recognition
Machine Learning ( IF 7.5 ) Pub Date : 2020-08-05 , DOI: 10.1007/s10994-020-05897-1
Haw-Shiuan Chang , Shankar Vembu , Sunil Mohan , Rheeya Uppaal , Andrew McCallum

Existing deep active learning algorithms achieve impressive sampling efficiency on natural language processing tasks. However, they exhibit several weaknesses in practice, including (a) inability to use uncertainty sampling with black-box models, (b) lack of robustness to labeling noise, and (c) lack of transparency. In response, we propose a transparent batch active sampling framework by estimating the error decay curves of multiple feature-defined subsets of the data. Experiments on four named entity recognition (NER) tasks demonstrate that the proposed methods significantly outperform diversification-based methods for black-box NER taggers, and can make the sampling process more robust to labeling noise when combined with uncertainty-based methods. Furthermore, the analysis of experimental results sheds light on the weaknesses of different active sampling strategies, and when traditional uncertainty-based or diversification-based methods can be expected to work well.

中文翻译:

使用误差衰减预测克服命名实体识别深度主动学习的实际问题

现有的深度主动学习算法在自然语言处理任务上实现了令人印象深刻的采样效率。然而,它们在实践中表现出几个弱点,包括(a)无法对黑盒模型使用不确定性采样,(b)缺乏对标记噪声的鲁棒性,以及(c)缺乏透明度。作为回应,我们通过估计数据的多个特征定义子集的误差衰减曲线,提出了一个透明的批量主动采样框架。在四个命名实体识别 (NER) 任务上的实验表明,所提出的方法明显优于基于多样化的黑盒 NER 标记方法,并且当与基于不确定性的方法结合时,可以使采样过程对标记噪声更加鲁棒。此外,
更新日期:2020-08-05
down
wechat
bug