当前位置: X-MOL 学术Digit. Scholarsh. Hum.it. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
History playground: A tool for discovering temporal trends in massive textual corpora
Digital Scholarship in the Humanities ( IF 0.7 ) Pub Date : 2019-06-01 , DOI: 10.1093/llc/fqy077
Thomas Lansdall-Welfare 1 , Nello Cristianini 1
Affiliation  

Recent studies have shown that macroscopic patterns of continuity and change over the course of centuries can be detected through the analysis of time series extracted from massive textual corpora. Similar data-driven approaches have already revolutionised the natural sciences, and are widely believed to hold similar potential for the humanities and social sciences, driven by the mass-digitisation projects that are currently under way, and coupled with the ever-increasing number of documents which are "born digital". As such, new interactive tools are required to discover and extract macroscopic patterns from these vast quantities of textual data. Here we present History Playground, an interactive web-based tool for discovering trends in massive textual corpora. The tool makes use of scalable algorithms to first extract trends from textual corpora, before making them available for real-time search and discovery, presenting users with an interface to explore the data. Included in the tool are algorithms for standardization, regression, change-point detection in the relative frequencies of ngrams, multi-term indices and comparison of trends across different corpora.

中文翻译:

历史游乐场:发现大量文本语料中时态趋势的工具

最近的研究表明,通过对从大量文本语料库中提取的时间序列进行分析,可以发现多个世纪以来的连续性和变化的宏观模式。相似的数据驱动方法已经彻底改变了自然科学,并且由于当前正在进行的大规模数字化项目以及越来越多的文档,人们普遍认为相似的数据驱动方法在人文科学和社会科学方面具有相似的潜力。这是“天生的数字”。因此,需要新的交互式工具来从这些大量的文本数据中发现并提取宏观模式。在这里,我们介绍了History Playground,这是一个基于网络的交互式工具,用于发现大量文本语料库中的趋势。该工具利用可伸缩算法首先从文本语料库中提取趋势,然后再将其用于实时搜索和发现,为用户提供一个界面来浏览数据。该工具中包括用于标准化,回归,以ngram的相对频率进行变化点检测,多项指标以及比较不同语料库趋势的算法。
更新日期:2019-06-01
down
wechat
bug