当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Beyond lexical frequencies: using R for text analysis in the digital humanities
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2019-04-08 , DOI: 10.1007/s10579-019-09456-6
Taylor Arnold , Nicolas Ballier , Paula Lissón , Lauren Tilton

This paper presents a combination of R packages—user contributed toolkits written in a common core programming language—to facilitate the humanistic investigation of digitised, text-based corpora. Our survey of text analysis packages includes those of our own creation (cleanNLP and fasttextM) as well as packages built by other research groups (stringi, readtext, hyphenatr, quanteda, and hunspell). By operating on generic object types, these packages unite research innovations in corpus linguistics, natural language processing, machine learning, statistics, and digital humanities. We begin by extrapolating on the theoretical benefits of R as an elaborate gluing language for bringing together several areas of expertise and compare it to linguistic concordancers and other tool-based approaches to text analysis in the digital humanities. We then showcase the practical benefits of an ecosystem by illustrating how R packages have been integrated into a digital humanities project. Throughout, the focus is on moving beyond the bag-of-words, lexical frequency model by incorporating linguistically-driven analyses in research.

中文翻译:

超越词汇频率:在数字人文科学中使用R进行文本分析

本文介绍了R包的组合(以通用核心编程语言编写的用户提供的工具包),以方便对基于文本的数字化语料库进行人性化的研究。我们对文本分析程序包的调查包括我们自己创建的程序包(cleanNLP和fasttextM)以及其他研究小组(stringi,readtext,hyphenatr,quanteda和hunspell)构建的程序包。通过对通用对象类型进行操作,这些程序包将语料库语言学,自然语言处理,机器学习,统计和数字人文科学方面的研究创新结合在一起。我们首先将R作为一种精巧的胶合语言的理论优势进行推断,以将多个专业领域融合在一起,并将其与语言协调员和其他基于工具的数字人文文本分析方法进行比较。然后,我们通过说明R软件包如何集成到数字人文项目中来展示生态系统的实际好处。在整个过程中,重点是通过将语言驱动的分析纳入研究范围,从而超越词袋,词频模型。
更新日期:2019-04-08
down
wechat
bug