Evaluating human versus machine learning performance in classifying research abstracts,Scientometrics

当前位置： X-MOL 学术 › Scientometrics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluating human versus machine learning performance in classifying research abstracts
Scientometrics ( IF 3.5 ) Pub Date : 2020-07-18 , DOI: 10.1007/s11192-020-03614-2
Yeow Chong Goh ₁ , Xin Qing Cai ₁ , Walter Theseira ₂ , Giovanni Ko ₃ , Khiam Aik Khor ₄

Affiliation

We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

中文翻译：

在分类研究摘要中评估人类与机器学习的性能

我们研究人类或机器学习 (ML) 分类模型是否更擅长根据一组固定的学科组对科学研究摘要进行分类。我们在不同的阶段招募本科生和研究生助理来完成这项任务，并将他们的表现与支持向量机 ML 算法在将欧洲研究委员会启动资助项目摘要分类到他们的实际评估小组（由学科组组织）进行比较。平均而言，在各种训练和测试数据集以及评估小组中，ML 比人类分类器更准确。在不同训练集上训练的 ML 分类器也比人类分类器更可靠，这意味着不同的 ML 分类器在将相同分类分配给任何给定摘要时更加一致，与不同的人类分类器相比。虽然在有限的情况下，前 5 个百分位的人类分类器可以胜过 ML，但与训练 ML 模型相比，此类分类器的选择和训练可能既昂贵又困难。我们的研究结果表明，机器学习模型是一种具有成本效益且高度准确的方法，用于解决比较文献计量分析中的问题，例如协调来自不同资助机构或国家的研究的学科分类。

更新日期：2020-07-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11