Citation context-based topic models: discovering cited and citing topics from full text,Library Hi Tech

当前位置： X-MOL 学术 › Library Hi Tech › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Citation context-based topic models: discovering cited and citing topics from full text
Library Hi Tech Pub Date : 2021-06-04 , DOI: 10.1108/lht-01-2021-0041
Lixue Zou , Xiwen Liu , Wray Buntine , Yanli Liu

Purpose

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.

Design/methodology/approach

The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.

Findings

The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.

Originality/value

The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.

中文翻译：

基于引文上下文的主题模型：从全文中发现被引和引用主题

目的

文档的全文是丰富的信息来源，可用于提供有意义的主题。本文的目的是展示如何使用全文中的引文上下文（CC），通过采用自动文本分析算法来高效、有效地识别被引主题和引用主题。

设计/方法/方法

作者提出了两个新颖的主题模型，Citation-Context-LDA (CC-LDA) 和 Citation-Context-Reference-LDA (CCRef-LDA)。CC用于从全文中提取引用文本，从而可以准确地发现主题。CC-LDA 包含 CC、引用文本及其潜在关系，而 CCRef-LDA 包含 CC、引用文本及其潜在关系和 CC 中的参考信息。折叠吉布斯采样用于实现近似估计。研究了 CC-LDA 同时学习引用主题和引用主题及其链接的能力。此外，提出了一种基于 CC-LDA 的话题影响力度量方法，并将其应用于创建两级话题之间的链接。此外，还研究了 CCRef-LDA 发现主题影响参考的能力。