Toward Effective Automated Content Analysis via Crowdsourcing,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward Effective Automated Content Analysis via Crowdsourcing
arXiv - CS - Information Retrieval Pub Date : 2021-01-12 , DOI: arxiv-2101.04615
Jiele Wu, Chau-Wai Wong, Xinyan Zhao, Xianpeng Liu

Many computer scientists use the aggregated answers of online workers to represent ground truth. Prior work has shown that aggregation methods such as majority voting are effective for measuring relatively objective features. For subjective features such as semantic connotation, online workers, known for optimizing their hourly earnings, tend to deteriorate in the quality of their responses as they work longer. In this paper, we aim to address this issue by proposing a quality-aware semantic data annotation system. We observe that with timely feedback on workers' performance quantified by quality scores, better informed online workers can maintain the quality of their labeling throughout an extended period of time. We validate the effectiveness of the proposed annotation system through i) evaluating performance based on an expert-labeled dataset, and ii) demonstrating machine learning tasks that can lead to consistent learning behavior with 70%-80% accuracy. Our results suggest that with our system, researchers can collect high-quality answers of subjective semantic features at a large scale.

中文翻译：

通过众包实现有效的自动化内容分析

许多计算机科学家使用在线工作人员的汇总答案来表示基本事实。先前的工作表明，诸如多数投票之类的汇总方法可有效地衡量相对客观的特征。对于诸如语义含义之类的主观功能，以优化时薪而闻名的在线工作者随着工作时间的延长，其响应质量趋于下降。在本文中，我们旨在通过提出一种质量感知的语义数据注释系统来解决这个问题。我们观察到，通过质量分数量化的对工人绩效的及时反馈，信息灵通的在线工作者可以在较长的时间内保持标签的质量。我们通过i）基于专家标记的数据集评估性能，以及ii）演示可以导致一致的学习行为且精度达到70％-80％的机器学习任务，来验证所提出的注释系统的有效性。我们的结果表明，使用我们的系统，研究人员可以大规模收集主观语义特征的高质量答案。

更新日期：2021-01-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文