Return to basics: Clustering of scientific literature using structural information,Journal of Informetrics

当前位置： X-MOL 学术 › J. Informetr. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Return to basics: Clustering of scientific literature using structural information
Journal of Informetrics ( IF 3.4 ) Pub Date : 2020-10-10 , DOI: 10.1016/j.joi.2020.101099
Jinhyuk Yun , Sejung Ahn , June Young Lee

Scholars frequently employ relatedness measures to estimate the similarity between two different items (e.g., documents, authors, and institutes). Such relatedness measures are commonly based on overlapping references (i.e., bibliographic coupling) or citations (i.e., co-citation) and can then be used with cluster analysis to find boundaries between research fields. Unfortunately, calculating a relatedness measure is challenging, especially for a large number of items, because the computational complexity is greater than linear. We propose an alternative method for identifying research fronts that uses direct citation inspired by relatedness measures. Our novel approach simply replicates a node into two distinct nodes: a citing node and cited node. We then apply typical clustering methods to the modified network. Clusters of citing nodes should emulate those from the bibliographic coupling relatedness network, while clusters of cited nodes should act like those from the co-citation relatedness network. In validation tests, our proposed method demonstrated high levels of similarity with conventional relatedness-based methods. We also found that the clustering results of the proposed method outperformed those of conventional relatedness-based measures regarding similarity with natural language processing-based classification.

中文翻译：

回归基础：使用结构信息对科学文献进行聚类

学者经常使用相关性度量来估计两个不同项目（例如，文档，作者和机构）之间的相似性。此类相关性度量通常基于重叠参考（即书目耦合）或引文（即（共引），然后可以与聚类分析一起使用，以找到研究领域之间的界限。不幸的是，计算相关性度量值极具挑战性，尤其是对于大量项目而言，因为计算复杂度大于线性。我们提出了一种识别研究前沿的替代方法，该方法使用了基于相关性度量方法的直接引用。我们的新颖方法只是将一个节点复制到两个不同的节点：一个引用节点和一个引用节点。然后，我们将典型的聚类方法应用于修改后的网络。引用节点的群集应模仿书目耦合相关性网络中的节点，而被引用节点的群集应像同引用关系网络中的节点一样。在验证测试中，我们提出的方法与传统的基于相关性的方法具有高度的相似性。我们还发现，与基于自然语言处理的分类相似，该方法的聚类结果优于传统的基于相关性的度量。

更新日期：2020-10-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11