A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry,Aslib Journal of Information Management

当前位置： X-MOL 学术 › Aslib Journal of Information Management › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry
Aslib Journal of Information Management ( IF 2.6 ) Pub Date : 2020-02-23 , DOI: 10.1108/ajim-07-2019-0192
Liang Hong , Wenjun Hou , Zonghui Wu , Huijie Han

The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).,The proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge.,The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks.,This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts.,The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH.,CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.

中文翻译：

数字人文知识提取的合作众包框架—以唐诗为例

本文的目的是提出一个知识提取框架，以从数字人文科学（DH）的非结构化文本中提取知识，包括实体和实体之间的关系。拟议的合作众包框架（CCF）同时使用人机协作和众包实现高质量和可扩展的知识提取。CCF将主动学习与基于类别的新型众包机制相集成，以方便领域专家标记和验证提取的知识。案例研究表明，CCF可以有效地从唐诗领域的多种来源的异类数据中提取知识。具体而言，CCF的知识提取准确度高于最新技术，主动学习机制可以使反馈对培训模型的贡献最大化，并且所提出的基于类别的众包机制可以通过考虑不同类别任务的工人的专业化来扩大有效的人机协作。在唐诗领域实现高质量和可扩展的知识提取。通过引入领域知识和专家，CCF可以推广到DH的其他领域。所提取的知识是机器可理解的，可以支持唐诗和知识驱动的DH智能应用的研究。集成主动学习和众包机制的循环知识提取框架；人机协作方法通过主动学习机制利用领域专家的反馈。基于类别的众包机制考虑了DH数据类别（尤其是领域专家）的匹配。

更新日期：2020-02-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>