Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Methodological differences matter: Identification thresholds and corpus composition in lexical bundle research
Southern African Linguistics and Applied Language Studies ( IF 0.560 ) Pub Date : 2020-12-15 , DOI: 10.2989/16073614.2020.1858897
Fan Pan 1
Affiliation  

Abstract

In lexical bundle research, it has been a common practice to extract and compare lexical bundles across different corpora based on certain identification thresholds. This line of study adopts varying frequency and dispersion thresholds because the corpora compared always differ in the sizes and/or the numbers of texts. However, few studies have ever considered the consequences of these methodological differences. To bridge the gap, a series of experiments were conducted to explore the impact of identification thresholds and corpus composition on bundle extraction and the results of cross-corpora comparison. The first set of experiments demonstrated that different identification thresholds applied to the same pair of corpora may yield conflicting results, which indicated that the methodological differences could be one source of mixed results in the literature. Further, after removing the influence of differences in the sizes and/or the numbers of texts, the second set of experiments revealed that increasing the dispersion thresholds proportionally to offset the differences in the numbers of texts actually favours the corpus with a smaller number of texts. This study highlighted the interactive relationship between frequency thresholds and dispersion thresholds and the key role of dispersion thresholds in filtering bundles. The article also discusses the methodological implications for future contrastive lexical bundle research.



中文翻译:

方法上的差异很重要:词汇捆绑研究中的识别阈值和语料库组成

摘要

在词汇束研究中,基于某些识别阈值来提取和比较不同语料库中的词汇束已成为一种惯例。本研究线采用不同的频率和离散阈值,因为所比较的语料库在文本的大小和/或数量上总是不同的。但是,很少有研究考虑过这些方法学差异的后果。为了弥合差距,进行了一系列实验以探索识别阈值和语料库组成对捆绑提取和跨语料库比较结果的影响。第一组实验表明,应用于同一对语料库的不同识别阈值可能会产生矛盾的结果,这表明方法上的差异可能是文献中混合结果的来源之一。此外,在消除了文本大小和/或数量差异的影响之后,第二组实验表明,按比例增加分散阈值以抵消文本数量上的差异实际上有利于文本数量较少的语料库。这项研究强调了频率阈值和色散阈值之间的交互关系,以及色散阈值在滤波束中的关键作用。本文还讨论了未来对比词汇丛研究的方法论意义。第二组实验表明,按比例增加色散阈值以抵消文本数量的差异实际上有利于文本数量较少的语料库。这项研究强调了频率阈值和色散阈值之间的交互关系,以及色散阈值在滤波束中的关键作用。本文还讨论了未来对比词汇丛研究的方法论意义。第二组实验表明,按比例增加色散阈值以抵消文本数量的差异实际上有利于文本数量较少的语料库。这项研究强调了频率阈值和色散阈值之间的交互关系,以及色散阈值在滤波束中的关键作用。本文还讨论了未来对比词汇丛研究的方法论意义。

更新日期:2021-02-12
down
wechat
bug