A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing
Machine Learning ( IF 4.3 ) Pub Date : 2014-11-15 , DOI: 10.1007/s10994-014-5470-z
Karthik Devarajan ₁ , Guoli Wang ₂ , Nader Ebrahimi ₃

Affiliation

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix $$V$$V into the product of two nonnegative matrices, $$W$$W and $$H$$H, such that $$V \sim WH$$V∼WH. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi’s divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for $$W$$W and $$H$$H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.

中文翻译：

非负矩阵分解和概率潜在语义索引的统一统计方法

非负矩阵分解 (NMF) 是一种强大的机器学习方法，用于将高维非负矩阵 $$V$$V 分解为两个非负矩阵 $$W$$W 和 $$H$$H 的乘积，使得 $$V \sim WH$$V∼WH。它已被证明具有基于部分的数据稀疏表示。NMF 已成功应用于自然语言处理、神经科学、信息检索、图像处理、语音识别和计算生物学等多个领域，用于分析和解释大规模数据。还同时开发了相关的统计潜在类建模方法，即概率潜在语义索引 (PLSI)，用于分析和解释自然语言处理中出现的共现计数数据。在本文中，我们基于 Renyi 在两个非负矩阵之间的散度（源自泊松似然）提出了 NMF 和 PLSI 的广义统计方法。我们的方法统一了各种竞争模型，并为这些方法提供了独特的理论框架。我们为 NMF 提出了一个统一的算法，并为 $$W$$W 和 $$H$$H 的乘法更新的单调性提供了严格的证明。此外，我们在这个框架内概括了 NMF 和 PLSI 之间的关系。我们展示了我们的方法的适用性和实用性，以及相对于使用现实生活和模拟文档聚类数据的现有方法的优越性能。我们的方法统一了各种竞争模型，并为这些方法提供了独特的理论框架。我们为 NMF 提出了一个统一的算法，并为 $$W$$W 和 $$H$$H 的乘法更新的单调性提供了严格的证明。此外，我们在这个框架内概括了 NMF 和 PLSI 之间的关系。我们展示了我们的方法的适用性和实用性，以及相对于使用现实生活和模拟文档聚类数据的现有方法的优越性能。我们的方法统一了各种竞争模型，并为这些方法提供了独特的理论框架。我们为 NMF 提出了一个统一的算法，并为 $$W$$W 和 $$H$$H 提供了乘法更新单调性的严格证明。此外，我们在这个框架内概括了 NMF 和 PLSI 之间的关系。我们展示了我们的方法的适用性和实用性，以及相对于使用现实生活和模拟文档聚类数据的现有方法的优越性能。

更新日期：2014-11-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11