Random-walk Based Generative Model for Classifying Document Networks,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Random-walk Based Generative Model for Classifying Document Networks
arXiv - CS - Information Retrieval Pub Date : 2020-01-21 , DOI: arxiv-2001.07380
Takafumi J. Suzuki

Document networks are found in various collections of real-world data, such as citation networks, hyperlinked web pages, and online social networks. A large number of generative models have been proposed because they offer intuitive and useful pictures for analyzing document networks. Prominent examples are relational topic models, where documents are linked according to their topic similarities. However, existing generative models do not make full use of network structures because they are largely dependent on topic modeling of documents. In particular, centrality of graph nodes is missing in generative processes of previous models. In this paper, we propose a novel generative model for document networks by introducing random walkers on networks to integrate the node centrality into link generation processes. The developed method is evaluated in semi-supervised classification tasks with real-world citation networks. We show that the proposed model outperforms existing probabilistic approaches especially in detecting communities in connected networks.

中文翻译：

基于随机游走的文档网络分类生成模型

文档网络存在于各种真实世界数据的集合中，例如引文网络、超链接网页和在线社交网络。已经提出了大量生成模型，因为它们为分析文档网络提供了直观且有用的图片。突出的例子是关系主题模型，其中文档根据它们的主题相似性进行链接。然而，现有的生成模型并没有充分利用网络结构，因为它们在很大程度上依赖于文档的主题建模。特别是，先前模型的生成过程中缺少图节点的中心性。在本文中，我们通过在网络上引入随机游走器将节点中心性集成到链接生成过程中，为文档网络提出了一种新的生成模型。开发的方法在具有真实世界引用网络的半监督分类任务中进行评估。我们表明，所提出的模型优于现有的概率方法，尤其是在检测连接网络中的社区方面。

更新日期：2020-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>