Outliers in L2 Research in Applied Linguistics: A Synthesis and Data Re-Analysis,Annual Review of Applied Linguistics

当前位置： X-MOL 学术 › Annu. Rev. Appl. Linguist. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Outliers in L2 Research in Applied Linguistics: A Synthesis and Data Re-Analysis
Annual Review of Applied Linguistics ( IF 2.8 ) Pub Date : 2020-06-30 , DOI: 10.1017/s0267190520000057
Christopher Nicklin , Luke Plonsky

Data from self-paced reading (SPR) tasks are routinely checked for statistical outliers (Marsden, Thompson, & Plonsky, 2018). Such data points can be handled in a variety of ways (e.g., trimming, data transformation), each of which may influence study results in a different manner. This two-phase study sought, first, to systematically review outlier handling techniques found in studies that involve SPR and, second, to re-analyze raw data from SPR tasks to understand the impact of those techniques. Toward these ends, in Phase I, a sample of 104 studies that employed SPR tasks was collected and coded for different outlier treatments. As found in Marsden et al. (2018), wide variability was observed across the sample in terms of selection of time and standard deviation (SD)-based boundaries for determining what constitutes a legitimate reading time (RT). In Phase II, the raw data from the SPR studies in Phase I were requested from the authors. Nineteen usable datasets were obtained and re-analyzed using data transformations, SD boundaries, trimming, and winsorizing, in order to test their relative effectiveness for normalizing SPR reaction time data. The results suggested that, in the vast majority of cases, logarithmic transformation circumvented the need for SD boundaries, which blindly eliminate or alter potentially legitimate data. The results also indicated that choice of SD boundary had little influence on the data and revealed no meaningful difference between trimming and winsorizing, implying that blindly removing data from SPR analyses might be unnecessary. Suggestions are provided for future research involving SPR data and the handling of outliers in second language (L2) research more generally.

中文翻译：

应用语言学二语研究中的异常值：综合和数据再分析

来自自定进度阅读 (SPR) 任务的数据会定期检查统计异常值 (Marsden, Thompson, & Plonsky, 2018)。可以以多种方式处理此类数据点（例如，修剪、数据转换），每种方式都可能以不同的方式影响研究结果。这项分两阶段的研究首先要系统地审查在涉及 SPR 的研究中发现的异常值处理技术，其次，重新分析 SPR 任务的原始数据以了解这些技术的影响。为此，在第一阶段，收集了 104 项采用 SPR 任务的研究样本，并针对不同的异常值处理进行编码。正如在马斯登等人中发现的那样。(2018), 在选择时间和基于标准偏差 (SD) 的边界以确定什么构成合法阅读时间 (RT) 方面，在整个样本中观察到了广泛的可变性。在第二阶段，作者要求提供第一阶段 SPR 研究的原始数据。使用数据转换、SD 边界、修整和 Winsorizing 获得并重新分析了 19 个可用数据集，以测试它们在标准化 SPR 反应时间数据方面的相对有效性。结果表明，在绝大多数情况下，对数变换绕过了对 SD 边界的需求，这会盲目地消除或改变潜在的合法数据。结果还表明，SD 边界的选择对数据的影响很小，并且修剪和 Winsorizing 之间没有显着差异，暗示从 SPR 分析中盲目删除数据可能是不必要的。为涉及 SPR 数据的未来研究和更普遍地处理第二语言 (L2) 研究中的异常值提供了建议。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文