当前位置: X-MOL 学术Cognitive Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Principled Approach to Feature Selection in Models of Sentence Processing
Cognitive Science ( IF 2.617 ) Pub Date : 2020-12-11 , DOI: 10.1111/cogs.12918
Garrett Smith 1 , Shravan Vasishth 1
Affiliation  

Among theories of human language comprehension, cue‐based memory retrieval has proven to be a useful framework for understanding when and how processing difficulty arises in the resolution of long‐distance dependencies. Most previous work in this area has assumed that very general retrieval cues like [+subject] or [+singular] do the work of identifying (and sometimes misidentifying) a retrieval target in order to establish a dependency between words. However, recent work suggests that general, handpicked retrieval cues like these may not be enough to explain illusions of plausibility (Cunnings & Sturt, 2018), which can arise in sentences like The letter next to the porcelain plate shattered. Capturing such retrieval interference effects requires lexically specific features and retrieval cues, but handpicking the features is hard to do in a principled way and greatly increases modeler degrees of freedom. To remedy this, we use well‐established word embedding methods for creating distributed lexical feature representations that encode information relevant for retrieval using distributed retrieval cue vectors. We show that the similarity between the feature and cue vectors (a measure of plausibility) predicts total reading times in Cunnings and Sturt’s eye‐tracking data. The features can easily be plugged into existing parsing models (including cue‐based retrieval and self‐organized parsing), putting very different models on more equal footing and facilitating future quantitative comparisons.

中文翻译:

句子处理模型中特征选择的原则方法

在人类语言理解的理论中,基于线索的记忆检索已被证明是一个有用的框架,可用于理解在解决长距离依赖时何时以及如何出现处理困难。该领域的大多数先前工作都假设非常通用的检索线索(如 [+subject] 或 [+singular])会识别(有时会错误识别)检索目标,以便在单词之间建立依赖关系。然而,最近的工作表明,像这样一般的、精心挑选的检索线索可能不足以解释似是而非的错觉(Cunnings & Sturt,2018),这可能出现在像瓷盘旁边的字母破碎的句子中. 捕捉这种检索干扰效应需要词汇特定的特征和检索线索,但手工挑选特征很难以原则性的方式进行,并且大大增加了建模者的自由度。为了解决这个问题,我们使用成熟的词嵌入方法来创建分布式词汇特征表示,这些特征表示使用分布式检索线索向量对与检索相关的信息进行编码。我们表明,特征向量和提示向量之间的相似性(可信度的度量)预测了 Cunnings 和 Sturt 眼动追踪数据的总阅读时间。这些特征可以很容易地插入现有的解析模型(包括基于线索的检索和自组织解析),将非常不同的模型置于更平等的地位并促进未来的定量比较。
更新日期:2020-12-11
down
wechat
bug