Pitting corpus-based classification models against each other: a case study for predicting constructional choice in written Estonian,Corpus Linguistics and Linguistic Theory

当前位置： X-MOL 学术 › Corpus Linguistics and Linguistic Theory › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pitting corpus-based classification models against each other: a case study for predicting constructional choice in written Estonian
Corpus Linguistics and Linguistic Theory ( IF 1.0 ) Pub Date : 2017-12-07 , DOI: 10.1515/cllt-2016-0010
Jane Klavan ₁

Affiliation

Abstract In the context of constructional alternatives, we may assume that speakers’ choice between alternative forms is influenced by a multitude of factors. At the moment, multivariate statistical classification modelling seems to be the best tool available to capture this knowledge quantitatively. There is a vast array of techniques available. In this paper, two distinct modelling techniques are applied – logistic regression and naïve discriminative learning – to predict the choice between two constructional alternatives in written Estonian. One of the central questions in statistical modelling concerns the evaluation of model fit. It is proposed that for linguistic analysis, the performance of alternative corpus-based models can be evaluated by, first, pitting them against each other and second, pitting them against experimental data. Previous work on modelling constructional and lexical choice has focused on one of the two aspects. The present paper takes this line of analysis further by combining the two approaches.

中文翻译：

基于点算语料库的分类模型相互对立：以书面爱沙尼亚语预测构造选择的案例研究

摘要在构造选择的上下文中，我们可以假设说话者在选择形式之间的选择受多种因素影响。目前，多变量统计分类建模似乎是可用于定量捕获此知识的最佳工具。有大量可用的技术。在本文中，应用了两种截然不同的建模技术-Logistic回归和朴素的判别式学习-来预测书面爱沙尼亚语中两种构造替代方案之间的选择。统计建模中的中心问题之一是模型拟合的评估。建议进行语言分析时，可以通过以下方法评估基于备选语料库的模型的性能：首先，将它们彼此对立；然后，将它们与实验数据对立。先前关于构造和词汇选择建模的工作集中在两个方面之一。本文通过结合两种方法，进一步分析了这一思路。

更新日期：2017-12-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文