Spectral Learning for Supervised Topic Models,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Spectral Learning for Supervised Topic Models
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2017-03-15 , DOI: 10.1109/tpami.2017.2682085
Yong Ren , Yining Wang , Jun Zhu

Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.

中文翻译：

监督主题模型的光谱学习

监督主题模型同时对大量文档的潜在主题结构以及与每个文档相关联的响应变量进行建模。现有的推论方法是基于变分近似或蒙特卡洛采样的，这经常遭受局部最小缺陷的困扰。光谱方法已用于学习无监督的主题模型，例如具有可证明的保证的潜在Dirichlet分配（LDA）。本文研究了应用频谱方法恢复监督LDA（sLDA）参数的可能性。我们首先提出了一种两阶段的频谱方法，该方法先恢复LDA的参数，然后再恢复功率更新方法，以恢复回归模型参数。然后，我们进一步提出了一种单相频谱算法，以共同恢复主题分布矩阵以及回归权重。我们的频谱算法证明是正确的，并且计算效率高。我们证明了每种算法的样本复杂度范围，并随后得出了sLDA的可识别性的充分条件。在合成和真实数据集上进行的全面实验验证了这一理论并证明了光谱算法的实际有效性。实际上，我们在大规模评论评分数据集上的结果表明，单相光谱算法仅能获得与最新方法相当甚至更好的性能，而以前的光谱方法研究很少报告这种有前途的性能。我们证明了每种算法的样本复杂度范围，并随后得出了sLDA的可识别性的充分条件。在合成和真实数据集上进行的全面实验验证了这一理论并证明了光谱算法的实际有效性。实际上，我们在大规模评论评分数据集上的结果表明，单相光谱算法仅能获得与最新方法相当甚至更好的性能，而以前的光谱方法研究很少报告这种有前途的性能。我们证明了每种算法的样本复杂度范围，并随后得出了sLDA的可识别性的充分条件。在合成和真实数据集上进行的全面实验验证了这一理论并证明了光谱算法的实际有效性。实际上，我们在大规模评论评分数据集上的结果表明，单相光谱算法仅能获得与最新方法相当甚至更好的性能，而以前的光谱方法研究很少报告这种有前途的性能。

更新日期：2018-02-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>