当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scientific Papers Citation Analysis using Textual Features and SMOTE Resampling Techniques
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2021-07-24 , DOI: 10.1016/j.patrec.2021.07.009
Muhammad Umer 1, 2 , Saima Sadiq 1, 1 , Malik Muhammad Saad Missen 2, 2 , Zahid Hameed 3 , Zahid Aslam 2 , Muhammad Abubakar Siddique 1 , Michele NAPPI 4
Affiliation  

Ascertaining the impact of research is significant for the research community and academia of all disciplines. The only prevalent measure associated with the quantification of research quality is the citation-count. Although a number of citations play a significant role in academic research, sometimes citations can be biased or made to discuss only the weaknesses and shortcomings of the research. By considering the sentiment of citations and recognizing patterns in text can aid in understanding the opinion of the peer research community and will also help in quantifying the quality of research articles. Efficient feature representation combined with machine learning classifiers has yielded significant improvement in text classification. However, the effectiveness of such combinations has not been analyzed for citation sentiment analysis. This study aims to investigate pattern recognition using machine learning models in combination with frequency-based and prediction-based feature representation techniques with and without using Synthetic Minority Oversampling Technique (SMOTE) on publicly available citation sentiment dataset. Sentiment of citation instances are classified into positive, negative or neutral. Results indicate that the Extra tree classifier in combination with Term Frequency-Inverse Document Frequency achieved 98.26% accuracy on the SMOTE-balanced dataset.



中文翻译:

使用文本特征和 SMOTE 重采样技术的科学论文引文分析

确定研究的影响对于所有学科的研究界和学术界都具有重要意义。与量化研究质量相关的唯一普遍衡量标准是引用计数。尽管许多引用在学术研究中发挥着重要作用,但有时引用可能存在偏见或仅讨论研究的弱点和缺点。通过考虑引文的情绪和识别文本中的模式,可以帮助理解同行研究社区的意见,也有助于量化研究文章的质量。高效的特征表示与机器学习分类器相结合,在文本分类方面取得了显着的进步。但是,尚未针对引文情感分析对此类组合的有效性进行分析。本研究旨在研究使用机器学习模型结合基于频率和基于预测的特征表示技术在公开可用的引文情感数据集上使用和不使用合成少数派过采样技术 (SMOTE) 的模式识别。引用实例的情绪分为正面、负面或中性。结果表明,额外的树分类器与词频-逆文档频率相结合,在 SMOTE 平衡数据集上达到了 98.26% 的准确率。引用实例的情绪分为正面、负面或中性。结果表明,额外的树分类器与词频-逆文档频率相结合,在 SMOTE 平衡数据集上达到了 98.26% 的准确率。引用实例的情绪分为正面、负面或中性。结果表明,额外的树分类器与词频-逆文档频率相结合,在 SMOTE 平衡数据集上达到了 98.26% 的准确率。

更新日期:2021-07-24
down
wechat
bug