Constructing and evaluating automated literature review systems,Scientometrics

当前位置： X-MOL 学术 › Scientometrics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Constructing and evaluating automated literature review systems
Scientometrics ( IF 3.5 ) Pub Date : 2020-06-03 , DOI: 10.1007/s11192-020-03490-w
Jason Portenoy , Jevin D. West

Automated literature reviews have the potential to accelerate knowledge synthesis and provide new insights. However, a lack of labeled ground-truth data has made it difficult to develop and evaluate these methods. We propose a framework that uses the reference lists from existing review papers as labeled data, which can then be used to train supervised classifiers, allowing for experimentation and testing of models and features at a large scale. We demonstrate our framework by training classifiers using different combinations of citation- and text-based features on 500 review papers. We use the R-Precision scores for the task of reconstructing the review papers’ reference lists as a way to evaluate and compare methods. We also extend our method, generating a novel set of articles relevant to the fields of misinformation studies and science communication. We find that our method can identify many of the most relevant papers for a literature review from a large set of candidate papers, and that our framework allows for development and testing of models and features to incrementally improve the results. The models we build are able to identify relevant papers even when starting with a very small set of seed papers. We also find that the methods can be adapted to identify previously undiscovered articles that may be relevant to a given topic.

中文翻译：

构建和评估自动化文献审查系统

自动化文献综述有可能加速知识合成并提供新的见解。然而，缺乏标记的真实数据使得开发和评估这些方法变得困难。我们提出了一个框架，该框架使用现有评论论文中的参考列表作为标记数据，然后可用于训练有监督的分类器，从而允许对模型和特征进行大规模的实验和测试。我们通过在 500 篇评论论文中使用基于引文和基于文本的特征的不同组合来训练分类器来展示我们的框架。我们将 R-Precision 分数用于重建评论论文参考列表的任务，以此作为评估和比较方法的一种方式。我们还扩展了我们的方法，生成一组与错误信息研究和科学传播领域相关的新颖文章。我们发现我们的方法可以从大量候选论文中识别出许多与文献综述最相关的论文，并且我们的框架允许开发和测试模型和特征以逐步改进结果。即使从非常小的种子论文集开始，我们构建的模型也能够识别相关论文。我们还发现，这些方法可以用来识别可能与给定主题相关的以前未被发现的文章。并且我们的框架允许开发和测试模型和功能以逐步改进结果。即使从非常小的种子论文集开始，我们构建的模型也能够识别相关论文。我们还发现，这些方法可以用来识别可能与给定主题相关的以前未被发现的文章。并且我们的框架允许开发和测试模型和功能以逐步改进结果。即使从非常小的种子论文集开始，我们构建的模型也能够识别相关论文。我们还发现，这些方法可以用来识别可能与给定主题相关的以前未被发现的文章。

更新日期：2020-06-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11