An NLP-based citation reason analysis using CCRO,Scientometrics

当前位置： X-MOL 学术 › Scientometrics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An NLP-based citation reason analysis using CCRO
Scientometrics ( IF 3.9 ) Pub Date : 2021-03-26 , DOI: 10.1007/s11192-021-03955-6
Imran Ihsan , M. Abdul Qadir

In recent scientific advances, Artificial Intelligence and Natural Language Processing are the major contributors to classifying documents and extracting information. Classifying citations in different classes have gathered a lot of attention due to the large volume of citations available in different digital libraries. Typical citation classification uses sentiment analysis, where various techniques are applied to citations texts to mainly classify them in “Positive”, “Negative” and “Neutral” sentiments. However, there can be innumerable reasons why an author selects another research for citation. Citations’ Context and Reasons Ontology—CCRO uses a clear scientific method to articulate eight basic reasons for citing by using an iterative process of sentiment analysis, collaborative meanings, and experts' opinions. Using CCRO, this research paper adopts an ontology-based approach to extract citation's reasons and instantiate ontology classes and properties on two different corpora of citation sentences. One corpus of citation sentences is a publicly available dataset, while the other is our own manually curated. The process uses a two-step approach. The first part is an interface to manually annotate each citation text in the selected corpora on CCRO properties. A team of carefully selected annotators has annotated each citation to achieve a high inter-annotator agreement. The second part focuses on the automatic extraction of these reasons. Using Natural Language Processing, Mapping Graph, and Reporting Verb in a citation sentence, citation's reason is extracted and mapped onto a CCRO property. After comparing both manual and automatic mapping, accuracy is calculated. Based on experiments and results, accuracy is calculated for both publicly available and own corpora of citation sentences.

中文翻译：

使用CCRO的基于NLP的引文原因分析

在最近的科学进步中，人工智能和自然语言处理是对文档进行分类和提取信息的主要贡献者。由于不同数字图书馆中提供了大量的引文，因此对不同类别的引文进行分类已经引起了广泛的关注。典型的引文分类使用情感分析，将多种技术应用于引文文本，主要将其分类为“正面”，“负面”和“中性”情感。但是，作者选择另一项研究进行引用可能有无数的原因。引文的上下文和原因本体论-CCRO通过使用情感分析，协作含义和专家意见的迭代过程，使用一种清晰的科学方法来阐明引用的八个基本原因。本文使用CCRO，采用基于本体的方法来提取引用的原因，并在两个不同的引用语句语料库上实例化本体的类和属性。一个引文句子的语料库是一个公开可用的数据集，另一个是我们自己人工策划的。该过程采用两步法。第一部分是一个界面，用于手动注释CCRO上所选语料库中的每个引文文本特性。一组经过精心选择的注释者已经对每个引用进行了注释，以实现较高的注释者之间的一致。第二部分着重于这些原因的自动提取。使用引文句子中的自然语言处理，映射图和报告动词，可以提取引文的原因并将其映射到CCRO属性上。比较手动和自动映射后，将计算准确性。根据实验和结果，可以计算公开可用和自己的引文句子的准确性。

更新日期：2021-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>