Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dacura: A new solution to data harvesting and knowledge extraction for the historical sciences
Historical Methods: A Journal of Quantitative and Interdisciplinary History ( IF 1.6 ) Pub Date : 2018-03-20 , DOI: 10.1080/01615440.2018.1443863
Peter N. Peregrine 1, 2 , Rob Brennan 3 , Thomas Currie 4 , Kevin Feeney 5 , Pieter François 6, 7 , Peter Turchin 8 , Harvey Whitehouse 7
Affiliation  

ABSTRACT

New advances in computer science address problems historical scientists face in gathering and evaluating the now vast data sources available through the Internet. As an example we introduce Dacura, a dataset curation platform designed to assist historical researchers in harvesting, evaluating, and curating high-quality information sets from the Internet and other sources. Dacura uses semantic knowledge graph technology to represent data as complex, inter-related knowledge allowing rapid search and retrieval of highly specific data without the need of a lookup table. Dacura automates the generation of tools to help non-experts curate high quality knowledge bases over time and to integrate data from multiple sources into its curated knowledge model. Together these features allow rapid harvesting and automated evaluation of Internet resources. We provide an example of Dacura in practice as the software employed to populate and manage the Seshat databank.



中文翻译:

Dacura:用于历史科学的数据收集和知识提取的新解决方案

摘要

计算机科学的新进展解决了历史学家在收集和评估可通过Internet获得的大量数据源时面临的问题。作为示例,我们介绍Dacura,这是一个数据集管理平台,旨在帮助历史研究人员从Internet和其他来源收集,评估和管理高质量的信息集。Dacura使用语义知识图技术将数据表示为复杂的,相互关联的知识,从而无需查找表即可快速搜索和检索高度特定的数据。Dacura可以自动生成工具,以帮助非专家逐步挑选高质量的知识库,并将来自多个来源的数据集成到其精选的知识模型中。这些功能结合在一起,可以快速收集资源并自动评估Internet资源。

更新日期:2018-03-20
down
wechat
bug