当前位置: X-MOL 学术Behav. Res. Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Do data from mechanical Turk subjects replicate accuracy, response time, and diffusion modeling results?
Behavior Research Methods ( IF 5.953 ) Pub Date : 2021-04-06 , DOI: 10.3758/s13428-021-01573-x
Roger Ratcliff 1 , Andrew T Hendrickson 2
Affiliation  

Online data collection is being used more and more, especially in the face of the COVID crisis. To examine the quality of such data, we chose to replicate lexical decision and item recognition paradigms from Ratcliff et al. (Cognitive Psychology, 60, 127-157, 2010) and numerosity discrimination paradigms from Ratcliff and McKoon (Psychological Review, 125, 183-217, 2018) with subjects recruited from Amazon Mechanical Turk (AMT). Along with these tasks, we collected data from either an IQ test or a math computation test. Subjects in the lexical decision and item recognition tasks were relatively well-behaved, with only a few giving a significant number of responses with response times (RTs) under 300 ms at chance accuracy, i.e., fast guesses, and a few with unstable RTs across a session. But in the numerosity discrimination tasks, almost half of the subjects gave a significant number of fast guesses and/or unstable RTs across the session. Diffusion model parameters were largely consistent with the earlier studies as were correlations across tasks and correlations with IQ and age. One surprising result was that eliminating fast outliers from subjects with highly variable RTs (those eliminated from the main analyses) produced diffusion model analyses that showed patterns of correlations similar to the subjects with stable performance. Methods for displaying data to examine stability, eliminating subjects, and implementing RT data collection on AMT including checks on timing are also discussed.



中文翻译:

来自机械 Turk 受试者的数据是否复制了准确​​性、响应时间和扩散建模结果?

在线数据收集的使用越来越多,尤其是在面对 COVID 危机的情况下。为了检查此类数据的质量,我们选择复制 Ratcliff 等人的词汇决策和项目识别范式。(认知心理学, 60 , 127-157, 2010) 和 Ratcliff 和 McKoon 的数量歧视范式 ( Psychological Review, 125, 183-217, 2018),受试者是从 Amazon Mechanical Turk (AMT) 招募的。除了这些任务,我们还从 IQ 测试或数学计算测试中收集数据。词汇决策和项目识别任务中的受试者表现相对良好,只有少数人给出了大量响应时间 (RT) 随机准确度低于 300 毫秒的响应,即快速猜测,而少数人的 RT 不稳定一个会话。但在数字辨别任务中,几乎一半的受试者在整个会话中给出了大量的快速猜测和/或不稳定的 RT。扩散模型参数与早期研究基本一致,任务间的相关性以及与智商和年龄的相关性也是如此。一个令人惊讶的结果是,从具有高度可变 RT 的受试者(从主要分析中排除的受试者)中消除快速异常值会产生扩散模型分析,该分析显示与具有稳定表现的受试者相似的相关模式。还讨论了显示数据以检查稳定性、消除主题和在 AMT 上实施 RT 数据收集(包括计时检查)的方法。

更新日期:2021-04-08
down
wechat
bug