Important citations identification by exploiting generative model into discriminative model,Journal of Information Science

当前位置： X-MOL 学术 › J. Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Important citations identification by exploiting generative model into discriminative model
Journal of Information Science ( IF 1.8 ) Pub Date : 2021-02-07 , DOI: 10.1177/0165551521991034
Xin An ₁ , Xin Sun ₁ , Shuo Xu ₂ , Liyuan Hao ₂ , Jinghong Li ₁

Affiliation

Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations.

中文翻译：

通过将生成模型转化为判别模型来识别重要引文

尽管科学文献之间的引用被视为传播，继承和发展科学知识的手段，但并非所有引用都具有同等地位。为了定性地处理引文功能和重要性分类的任务，已经实施了许多分类法和机器学习模型。受到最终通用模型促进支持向量机（SVM）模型性能的内核功能成功的启发，这项工作利用了将生成模型和判别模型相结合的潜力来进行引文重要性分类。更详细地说，生成特征是从主题模型，引用影响模型（CIM）生成的，然后馈送到两个具有区别性的传统机器学习模型SVM和RF（随机森林），以及深度学习模型卷积神经网络（CNN）和其他13种传统功能来识别重要的引文。对具有不同特征的两个数据集进行了广泛的实验。这三个模型在一个学科的数据集上表现更好。重要引文的模式很可能因字段而异，这会使机器学习模型无法有效地从多个领域的出版物中学习区分性模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。以及其他13种传统功能来识别重要的引文。对具有不同特征的两个数据集进行了广泛的实验。这三个模型在一个学科的数据集上表现更好。重要引文的模式很可能因字段而异，这会使机器学习模型无法有效地从多个领域的出版物中学习区分性模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。以及其他13种传统功能来识别重要的引文。对具有不同特征的两个数据集进行了广泛的实验。这三个模型在一个学科的数据集上表现更好。重要引文的模式很可能因字段而异，这会使机器学习模型无法有效地从多个领域的出版物中学习区分性模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。对具有不同特征的两个数据集进行了广泛的实验。这三个模型在一个学科的数据集上表现更好。重要引文的模式很可能因字段而异，这会使机器学习模型无法有效地从多个领域的出版物中学习区分性模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。对具有不同特征的两个数据集进行了广泛的实验。这三个模型在一个学科的数据集上表现更好。重要引文的模式很可能因字段而异，这会使机器学习模型无法有效地从多个领域的出版物中学习区分性模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。这使机器学习模型无法有效地从多个领域的出版物中学习区分模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。这使机器学习模型无法有效地从多个领域的出版物中学习区分模式。RF分类器的性能优于SVM分类器，这与许多先前的研究一致。但是，由于数据集规模较小，CNN模型无法实现所需的性能。此外，我们基于CIM模型的功能进一步提高了识别重要引用的性能。

更新日期：2021-02-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11