Modeling Sequential Annotations for Sequence Labeling With Crowds,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modeling Sequential Annotations for Sequence Labeling With Crowds
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2021-10-19 , DOI: 10.1109/tcyb.2021.3117700
Xiaolei Lu , Tommy W. S. Chow

Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations, the quality of label sequence relies on the expertise level of annotators in capturing internal dependencies for each token in the sequence. In this article, we propose modeling sequential annotation for sequence labeling with crowds (SA-SLC). First, a conditional probabilistic model is developed to jointly model sequential data and annotators’ expertise, in which categorical distribution is introduced to estimate the reliability of each annotator in capturing local and nonlocal label dependencies for sequential annotation. To accelerate the marginalization of the proposed model, a valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations. VLSE derives possible ground-truth labels from the tokenwise level and further prunes subpaths in the forward inference for label sequence decoding. VLSE reduces the number of candidate label sequences and improves the quality of possible ground-truth label sequences. The experimental results on several sequence labeling tasks of Natural Language Processing show the effectiveness of the proposed model.

中文翻译：

为人群序列标签建模序列注释

人群顺序注释可以是构建用于序列标记的大型数据集的高效且具有成本效益的方法。与标记独立实例不同，对于人群顺序注释，标签序列的质量依赖于注释者捕获序列中每个标记的内部依赖性的专业水平。在本文中，我们建议为人群序列标记 (SA-SLC) 建模顺序注释。首先，开发了一个条件概率模型来联合建模顺序数据和注释者的专业知识，其中引入分类分布来估计每个注释者在捕获局部和非局部标签依赖性以进行顺序注释方面的可靠性。为了加速所提出模型的边缘化，提出了一种有效的标签序列推断（VLSE）方法，以从人群序列注释中导出有效的地面真实标签序列。VLSE 从 tokenwise 级别导出可能的真实标签，并在标签序列解码的前向推理中进一步修剪子路径。VLSE 减少了候选标签序列的数量并提高了可能的真实标签序列的质量。在自然语言处理的多个序列标注任务上的实验结果表明了所提模型的有效性。VLSE 减少了候选标签序列的数量并提高了可能的真实标签序列的质量。在自然语言处理的多个序列标注任务上的实验结果表明了所提模型的有效性。VLSE 减少了候选标签序列的数量并提高了可能的真实标签序列的质量。在自然语言处理的多个序列标注任务上的实验结果表明了所提模型的有效性。

更新日期：2021-10-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>