Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing,Journal of Informetrics

当前位置： X-MOL 学术 › J. Informetr. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing
Journal of Informetrics ( IF 3.4 ) Pub Date : 2020-10-11 , DOI: 10.1016/j.joi.2020.101091
Yuzhuo Wang , Chengzhi Zhang

In the era of big data, the advancement, improvement, and application of algorithms in academic research have played an important role in promoting the development of different disciplines. Academic papers in various disciplines, especially computer science, contain a large number of algorithms. Identifying the algorithms from the full-text content of papers can determine popular or classical algorithms in a specific field and help scholars gain a comprehensive understanding of the algorithms and even the field. To this end, this article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field. A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching. The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm. Our results reveal the algorithm with the highest influence in NLP papers and show that classification algorithms represent the largest proportion among the high-impact algorithms. In addition, the evolution of the influence of algorithms reflects the changes in research tasks and topics in the field, and the changes in the influence of different algorithms show different trends. As a preliminary exploration, this paper conducts an analysis of the impact of algorithms mentioned in the academic text, and the results can be used as training data for the automatic extraction of large-scale algorithms in the future. The methodology in this paper is domain-independent and can be applied to other domains.

中文翻译：

使用学术文章的全文内容来识别和评估自然语言处理领域中的算法实体

在大数据时代，算法的发展，改进和在学术研究中的应用在促进不同学科的发展方面发挥了重要作用。各个学科（尤其是计算机科学）的学术论文都包含大量算法。从论文的全文内容中识别算法可以确定特定领域的流行算法或经典算法，并有助于学者对算法甚至该领域获得全面的了解。为此，本文以自然语言处理（NLP）领域为例，并从该领域的学术论文中确定算法。通过手动注释论文内容来构造算法词典，通过基于字典的匹配来提取字典中包含算法的句子。提及算法的文章数量被用作分析该算法影响的指标。我们的结果揭示了在NLP论文中影响最大的算法，并且表明分类算法在高影响力算法中所占比例最大。另外，算法影响力的演变反映了该领域研究任务和主题的变化，不同算法影响力的变化呈现出不同的趋势。作为初步探索，本文对学术课文中提到的算法的影响进行了分析，其结果可作为将来自动提取大规模算法的训练数据。

更新日期：2020-10-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11