Unsupervised learning of textual pattern based on Propagation in Bipartite Graph,Intelligent Data Analysis

当前位置： X-MOL 学术 › Intell. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unsupervised learning of textual pattern based on Propagation in Bipartite Graph
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2020-05-21 , DOI: 10.3233/ida-194528
Thiago de Paulo Faleiros ₁ , Alan Valejo ₂ , Alneu de Andrade Lopes ₂

Affiliation

Graph-based algorithms have aroused considerable interests in recent years by facilitating pattern recognition and learning via information propagation process through the graph. Here, we propose an unsupervised learning algorithm based on propagation on bipartite graph, referred to as Propagationin Bipartite Graph (PBG) algorithm. The contributions of this approach are threefold: 1) we present an iterative graph-based algorithm and a straight-forward bipartite representation for textual data, in which vertices represent documents and words, and edges between documents and words represent the occurrences of the words in the documents. Additionally, 2) we show that PBG is more flexible and easier to be adapted for different applications than the mathematical formalism of the generative models, and 3) we present a comprehensive evaluation and comparison of PBG to other topic extraction techniques. Here, we describe the strategy employed in PBG algorithm as a problem of maximization of similarity between latent vectors assigned to vertices and edges and demonstrate that the proposed strategy can be improved by assigning good initial values for the vectors. We notice that PBG can be parallelized by a simple adjustment in the algorithm. We also show that the proposed algorithm is competitive with LDA and NMF in the task of textual collection modelling, returning coherent topics, and in the dimensionality reduction task.

中文翻译：

基于二分图中传播的文本模式的无监督学习

近年来，基于图的算法通过促进通过图的信息传播过程进行模式识别和学习，引起了人们的极大兴趣。在此，我们提出了一种基于二部图传播的无监督学习算法，称为“二部图传播”（PBG）算法。这种方法的贡献有三点：1）我们提出了一种基于迭代图的算法和文本数据的直接二分表示，其中顶点表示文档和单词，文档和单词之间的边表示单词在单词中的出现。文件。此外，2）我们证明PBG比生成模型的数学形式主义更灵活，更易于适应不同的应用，3）我们对PBG与其他主题提取技术进行了全面的评估和比较。在这里，我们将PBG算法中采用的策略描述为最大化分配给顶点和边的潜在矢量之间的相似性的问题，并证明可以通过为矢量分配良好的初始值来改进所提出的策略。我们注意到，可以通过算法中的简单调整将PBG并行化。我们还表明，在文本集合建模，返回相关主题以及降维任务方面，该算法与LDA和NMF具有竞争优势。我们将PBG算法中采用的策略描述为最大化分配给顶点和边的潜在向量之间的相似性的问题，并证明可以通过为向量分配良好的初始值来改善所提出的策略。我们注意到，可以通过算法中的简单调整将PBG并行化。我们还表明，在文本集合建模，返回相关主题以及降维任务方面，该算法与LDA和NMF具有竞争优势。我们将PBG算法中采用的策略描述为最大化分配给顶点和边的潜在向量之间的相似性的问题，并证明可以通过为向量分配良好的初始值来改善所提出的策略。我们注意到，可以通过算法中的简单调整将PBG并行化。我们还表明，在文本集合建模，返回相关主题以及降维任务方面，该算法与LDA和NMF具有竞争优势。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11