The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions,Language Resources and Evaluation

当前位置： X-MOL 学术 › Lang. Resour. Eval. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2020-09-04 , DOI: 10.1007/s10579-020-09503-7
Richard Futrell ₁ , Edward Gibson ₂ , Harry J Tily ₃ , Idan Blank ₄ , Anastasia Vishnevetsky ₂ , Steven T Piantadosi ₅ , Evelina Fedorenko ₂

Affiliation

It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data.

中文翻译：

The Natural Stories 语料库：包含罕见句法结构的英语文本阅读时语料库

现在，通过比较人类语言处理模型在丰富的自然语言材料语料库上预测处理难度的行为和神经测量（例如阅读时间）的程度，来比较人类语言处理模型已成为一种常见做法。然而，这些语料库中有许多基于自然发生的文本，不包含许多通常需要区分处理理论的低频句法结构。在这里，我们描述了一个新的语料库，它由经过编辑的英语文本组成，其中包含许多低频句法结构，同时对母语人士来说仍然听起来很流畅。语料库使用手动校正的 Penn Treebank 式解析树进行注释，并包括自定进度的阅读时间数据和对齐的录音。我们对语料库的内容进行概述，使用语料库回顾最近的工作，

更新日期：2020-09-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11