当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep NMF Topic Modeling
arXiv - CS - Information Retrieval Pub Date : 2021-02-24 , DOI: arxiv-2102.12998
JianYu Wang, Xiao-Lei Zhang

Nonnegative matrix factorization (NMF) based topic modeling methods do not rely on model- or data-assumptions much. However, they are usually formulated as difficult optimization problems, which may suffer from bad local minima and high computational complexity. In this paper, we propose a deep NMF (DNMF) topic modeling framework to alleviate the aforementioned problems. It first applies an unsupervised deep learning method to learn latent hierarchical structures of documents, under the assumption that if we could learn a good representation of documents by, e.g. a deep model, then the topic word discovery problem can be boosted. Then, it takes the output of the deep model to constrain a topic-document distribution for the discovery of the discriminant topic words, which not only improves the efficacy but also reduces the computational complexity over conventional unsupervised NMF methods. We constrain the topic-document distribution in three ways, which takes the advantages of the three major sub-categories of NMF -- basic NMF, structured NMF, and constrained NMF respectively. To overcome the weaknesses of deep neural networks in unsupervised topic modeling, we adopt a non-neural-network deep model -- multilayer bootstrap network. To our knowledge, this is the first time that a deep NMF model is used for unsupervised topic modeling. We have compared the proposed method with a number of representative references covering major branches of topic modeling on a variety of real-world text corpora. Experimental results illustrate the effectiveness of the proposed method under various evaluation metrics.

中文翻译:

深度NMF主题建模

基于非负矩阵分解(NMF)的主题建模方法对模型或数据假设的依赖不大。但是,它们通常被表述为困难的优化问题,可能会遇到糟糕的局部最小值和高计算复杂度的问题。在本文中,我们提出了一个深度NMF(DNMF)主题建模框架来缓解上述问题。它首先应用一种无监督的深度学习方法来学习潜在的文档层次结构,其前提是,如果我们可以通过例如深度模型来学习文档的良好表示,那么就会增加主题词发现问题。然后,利用深度模型的输出来约束主题文档的分布,以发现可区分的主题词,与传统的无监督NMF方法相比,这不仅提高了功效,而且降低了计算复杂度。我们以三种方式限制主题文档的分发,这利用了NMF的三个主要子类别的优势:分别是基本NMF,结构化NMF和受约束的NMF。为了克服深度神经网络在无监督主题建模中的缺点,我们采用了非神经网络深度模型-多层引导网络。据我们所知,这是首次将深度NMF模型用于无监督主题建模。我们已经将所提出的方法与许多代表性参考文献进行了比较,这些参考文献涵盖了各种现实世界文本语料库上主题建模的主要分支。实验结果说明了该方法在各种评估指标下的有效性。
更新日期:2021-02-26
down
wechat
bug