当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches
Journal of the Association for Information Science and Technology ( IF 2.8 ) Pub Date : 2021-01-23 , DOI: 10.1002/asi.24452
Peter Sjögårde 1, 2 , Per Ahlgren 3 , Ludo Waltman 4
Affiliation  

Algorithmic classifications of research publications can be used to study many different aspects of the science system, such as the organization of science into fields, the growth of fields, interdisciplinarity, and emerging topics. How to label the classes in these classifications is a problem that has not been thoroughly addressed in the literature. In this study we evaluate different approaches to label the classes in algorithmically constructed classifications of research publications. We focus on two important choices: the choice of (1) different bibliographic fields and (2) different approaches to weight the relevance of terms. To evaluate the different choices, we created two baselines: one based on the Medical Subject Headings in MEDLINE and another based on the Science-Metrix journal classification. We tested to what extent different approaches yield the desired labels for the classes in the two baselines. Based on our results we recommend extracting terms from titles and keywords to label classes at high levels of granularity (e.g. topics). At low levels of granularity (e.g. disciplines) we recommend extracting terms from journal names and author addresses. We recommend the use of a new approach, term frequency to specificity ratio, to calculate the relevance of terms.

中文翻译:

出版物分层分类中的算法标记:书目领域和术语加权方法的评估

研究出版物的算法分类可用于研究科学系统的许多不同方面,例如将科学组织成领域、领域的发展、跨学科和新兴主题。如何在这些分类中标记类别是文献中尚未彻底解决的问题。在这项研究中,我们评估了在算法构建的研究出版物分类中标记类别的不同方法。我们关注两个重要的选择:(1)不同书目领域的选择和(2)衡量术语相关性的不同方法。为了评估不同的选择,我们创建了两个基线:一个基于 MEDLINE 中的医学主题标题,另一个基于 Science-Metrix 期刊分类。我们测试了不同方法在多大程度上为两个基线中的类产生了所需的标签。根据我们的结果,我们建议从标题和关键字中提取术语,以高粒度级别(例如主题)标记类别。在低粒度级别(例如学科),我们建议从期刊名称和作者地址中提取术语。我们建议使用一种新方法,术语频率与特异性比率,来计算术语的相关性。
更新日期:2021-01-23
down
wechat
bug