当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Important citation identification by exploiting the syntactic and contextual information of citations
Scientometrics ( IF 3.9 ) Pub Date : 2020-09-02 , DOI: 10.1007/s11192-020-03677-1
Mingyang Wang , Jiaqi Zhang , Shijia Jiao , Xiangrong Zhang , Na Zhu , Guangsheng Chen

Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.

中文翻译:

通过利用引文的句法和上下文信息进行重要引文识别

引用并不同等重要。研究人员提出了不同的模型和技术来识别重要的引文。然而,这些工作中使用的特征相对有限,因此无法达到良好的识别性能。本文提出了一种新的机器学习框架,通过检查引文的句法和上下文信息来区分重要和不重要的引文。其中,句法特征反映了引用行为带来的统计视角特征,如被引文章在被引文献中的被引频次、被引位置等。语境特征反映了引文带来的语义内容特征,如引文的意图和极性。三种特征选择算法,Pearson相关系数、relief-F和熵权法,用于计算每个索引对区分不同类型引文的贡献。在此基础上,筛选出可以更好地识别重要引用的关键特征。使用支持向量机、KNN和随机森林三个分类器来测试这些关键特征的分类性能。实验是在两个带注释的基准数据集上进行的。结果表明,与当代最先进的研究相比,本文提出的框架可以实现更好的分类性能。引文的句法和上下文特征对于识别重要引文具有重要价值。使用支持向量机、KNN和随机森林三个分类器来测试这些关键特征的分类性能。实验是在两个带注释的基准数据集上进行的。结果表明,与当代最先进的研究相比,本文提出的框架可以实现更好的分类性能。引文的句法和上下文特征对于识别重要引文具有重要价值。使用支持向量机、KNN和随机森林三个分类器来测试这些关键特征的分类性能。实验是在两个带注释的基准数据集上进行的。结果表明,与当代最先进的研究相比,本文提出的框架可以实现更好的分类性能。引文的句法和上下文特征对于识别重要引文具有重要价值。
更新日期:2020-09-02
down
wechat
bug