Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research,Educational Psychology Review

当前位置： X-MOL 学术 › Educ. Psychol. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research
Educational Psychology Review ( IF 10.1 ) Pub Date : 2024-02-08 , DOI: 10.1007/s10648-024-09862-5
Diego G. Campos , Tim Fütterer , Thomas Gfrörer , Rosa Lavelle-Hill , Kou Murayama , Lars König , Martin Hecht , Steffen Zitzmann , Ronny Scherer

Abstract

Systematic reviews and meta-analyses are crucial for advancing research, yet they are time-consuming and resource-demanding. Although machine learning and natural language processing algorithms may reduce this time and these resources, their performance has not been tested in education and educational psychology, and there is a lack of clear information on when researchers should stop the reviewing process. In this study, we conducted a retrospective screening simulation using 27 systematic reviews in education and educational psychology. We evaluated the sensitivity, specificity, and estimated time savings of several learning algorithms and heuristic stopping criteria. The results showed, on average, a 58% (SD = 19%) reduction in the screening workload of irrelevant records when using learning algorithms for abstract screening and an estimated time savings of 1.66 days (SD = 1.80). The learning algorithm random forests with sentence bidirectional encoder representations from transformers outperformed other algorithms. This finding emphasizes the importance of incorporating semantic and contextual information during feature extraction and modeling in the screening process. Furthermore, we found that 95% of all relevant abstracts within a given dataset can be retrieved using heuristic stopping rules. Specifically, an approach that stops the screening process after classifying 20% of records and consecutively classifying 5% of irrelevant papers yielded the most significant gains in terms of specificity (M = 42%, SD = 28%). However, the performance of the heuristic stopping criteria depended on the learning algorithm used and the length and proportion of relevant papers in an abstract collection. Our study provides empirical evidence on the performance of machine learning screening algorithms for abstract screening in systematic reviews in education and educational psychology.

中文翻译：

筛选更聪明，而不是更难：机器学习筛选算法和教育研究系统评价的启发式停止标准的比较分析

摘要

系统评价和荟萃分析对于推进研究至关重要，但它们既耗时又需要资源。尽管机器学习和自然语言处理算法可能会减少这个时间和这些资源，但它们的性能尚未在教育和教育心理学中得到测试，并且缺乏关于研究人员何时应该停止审查过程的明确信息。在这项研究中，我们利用 27 项教育和教育心理学方面的系统评价进行了回顾性筛选模拟。我们评估了几种学习算法和启发式停止标准的敏感性、特异性和估计时间节省。结果显示，使用学习算法进行摘要筛选时，不相关记录的筛选工作量平均减少了 58% ( SD = 19%)，预计节省时间 1.66 天 ( SD = 1.80)。具有来自 Transformer 的句子双向编码器表示的学习算法随机森林优于其他算法。这一发现强调了在筛选过程中的特征提取和建模过程中结合语义和上下文信息的重要性。此外，我们发现给定数据集中 95% 的相关摘要可以使用启发式停止规则检索。具体来说，在对 20% 的记录进行分类并连续对 5% 的不相关论文进行分类后停止筛选过程的方法在特异性方面产生了最显着的收益（M = 42%，SD = 28%）。然而，启发式停止标准的性能取决于所使用的学习算法以及摘要集合中相关论文的长度和比例。我们的研究为机器学习筛选算法在教育和教育心理学的系统评价中进行摘要筛选的性能提供了经验证据。

更新日期：2024-02-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>