当前位置: X-MOL 学术Corpus Linguistics and Linguistic Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DISCOver: DIStributional approach based on syntactic dependencies for discovering COnstructions
Corpus Linguistics and Linguistic Theory ( IF 2.143 ) Pub Date : 2019-01-04 , DOI: 10.1515/cllt-2018-0028
Maria Antònia Martí 1 , Mariona Taulé 2 , Venelin Kovatchev 1 , Maria Salamó 3
Affiliation  

Abstract One of the goals in Cognitive Linguistics is the automatic identification and analysis of constructions, since they are fundamental linguistic units for understanding language. This article presents DISCOver, an unsupervised methodology for the automatic discovery of lexico-syntactic patterns that can be considered as candidates for constructions. This methodology follows a distributional semantic approach. Concretely, it is based on our proposed pattern-construction hypothesis: those contexts that are relevant to the definition of a cluster of semantically related words tend to be (part of) lexico-syntactic constructions. Our proposal uses Distributional Semantic Models for modelling the context taking into account syntactic dependencies. After a clustering process, we linked all those clusters with strong relationships and we use them as a source of information for deriving lexico-syntactic patterns, obtaining a total number of 220,732 candidates from a 100 million token corpus of Spanish. We evaluated the patterns obtained intrinsically, applying statistical association measures and they were also evaluated qualitatively by experts. Our results were superior to the baseline in both quality and quantity in all cases. While our experiments have been carried out using a Spanish corpus, this methodology is language independent and only requires a large corpus annotated with the parts of speech and dependencies to be applied.

中文翻译:

DISCOver:基于句法依赖的分布式方法,用于发现构造

摘要 认知语言学的目标之一是自动识别和分析结构,因为它们是理解语言的基本语言单位。本文介绍了 DISCOver,这是一种自动发现词汇句法模式的无监督方法,可以将其视为构造的候选者。该方法遵循分布式语义方法。具体来说,它基于我们提出的模式构造假设:那些与语义相关词簇的定义相关的上下文往往是(部分)词汇句法构造。我们的提议使用分布式语义模型来建模考虑到句法依赖关系的上下文。经过一个聚类过程,我们将所有这些集群与牢固的关系联系起来,并将它们用作推导词汇句法模式的信息来源,从 1 亿个西班牙语令牌语料库中获得总数为 220,732 的候选者。我们评估了内在获得的模式,应用统计关联措施,并且专家也对它们进行了定性评估。在所有情况下,我们的结果在质量和数量上都优于基线。虽然我们的实验是使用西班牙语语料库进行的,但这种方法与语言无关,只需要一个带有词性和依赖关系注释的大型语料库。我们评估了内在获得的模式,应用统计关联措施,并且专家也对它们进行了定性评估。在所有情况下,我们的结果在质量和数量上都优于基线。虽然我们的实验是使用西班牙语语料库进行的,但这种方法与语言无关,只需要一个带有词性和依赖关系注释的大型语料库。我们评估了内在获得的模式,应用统计关联措施,并且专家也对它们进行了定性评估。在所有情况下,我们的结果在质量和数量上都优于基线。虽然我们的实验是使用西班牙语语料库进行的,但这种方法与语言无关,只需要一个带有词性和依赖关系注释的大型语料库。
更新日期:2019-01-04
down
wechat
bug