Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting,Educational and Psychological Measurement

当前位置： X-MOL 学术 › Educ. Psychol. Meas. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting
Educational and Psychological Measurement ( IF 2.1 ) Pub Date : 2021-04-19 , DOI: 10.1177/00131644211004708
Ulrich Schroeders ₁ , Christoph Schmidt ₂ , Timo Gnambs ₃

Affiliation

Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.

中文翻译：

使用随机梯度提升检测调查数据中的粗心响应

粗心回应是调查回应中忽视实际项目内容的偏见，对心理测量的因素结构、可靠性和有效性构成威胁。已经提出了不同的方法来检测异常响应，例如直接评估应试行为（例如，虚假项目）、辅助或辅助数据（例如，响应时间）或数据驱动的统计技术（例如，马氏距离）的探索性问题。在本研究中，引入梯度提升树（一种最先进的机器学习技术）来识别粗心的受访者。该方法的性能与文献中先前描述的已建立技术（例如，统计异常值方法、一致性分析、和响应模式函数）使用来自基于网络的研究的模拟数据和经验数据，其中通过实验诱导了勤奋与粗心的响应行为。在模拟研究中，梯度增强机器在标记异常响应方面优于传统检测机制。然而，这种优势并没有转移到实证研究中。在精度方面，传统和新型检测机制的结果都不能令人满意，尽管后者将响应时间作为附加信息。模拟结果与在线研究结果之间的比较表明，现实世界环境中的响应似乎比模拟研究预期的要不稳定得多。

更新日期：2021-04-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11