当前位置: X-MOL 学术International Journal of Corpus Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BasiScript
International Journal of Corpus Linguistics ( IF 1.6 ) Pub Date : 2018-12-27 , DOI: 10.1075/ijcl.17086.tel
Agnes Tellings 1 , Nelleke Oostdijk 1 , Iris Monster 1 , Franc Grootjen 1 , Antal van den Bosch 1
Affiliation  

This short paper introduces BasiScript, a 9-million-word corpus of contemporary Dutch texts written by primary school children. The data were collected over three years with 17,216 children contributing texts throughout this period. Each word token in the corpus is annotated with the correct orthographical form, the associated lemma and the part of speech. The most frequent polysemous words have been annotated for word meaning, while all words in the lexicon that was derived from the BasiScript corpus have been annotated for corpus and subcorpora frequency, dispersion, length, family size, family frequency, orthographic neighborhood size, and orthographic neighborhood frequency. Images of the texts are available to researchers. The present article describes the corpus and presents a comparison of BasiScript with BasiLex (a Dutch corpus with texts primary school children are likely to read, completed in 2015) by means of frequency profiling.

中文翻译:

基本脚本

这篇简短的论文介绍了 BasiScript,这是一个由小学生编写的 900 万字的当代荷兰语文本语料库。这些数据是在三年内收集的,在此期间有 17,216 名儿童贡献了文本。语料库中的每个单词标记都用正确的拼写形式、相关的引理和词性进行了注释。最常见的多义词已针对词义进行了注释,而源自 BasiScript 语料库的词典中的所有单词均已针对语料库和子语料库频率、离散度、长度、家庭规模、家庭频率、正交邻域大小和正交性进行了注释。邻域频率。文本的图像可供研究人员使用。
更新日期:2018-12-27
down
wechat
bug