当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bi-Directional Recurrent Attentional Topic Model
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2020-09-29 , DOI: 10.1145/3412371
Shuangyin Li 1 , Yu Zhang 2 , Rong Pan 3
Affiliation  

In a document, the topic distribution of a sentence depends on both the topics of its neighbored sentences and its own content, and it is usually affected by the topics of the neighbored sentences with different weights. The neighbored sentences of a sentence include the preceding sentences and the subsequent sentences. Meanwhile, it is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a bi-Directional Recurrent Attentional Topic Model (bi-RATM) for document embedding. The bi-RATM not only takes advantage of the sequential orders among sentences but also uses the attention mechanism to model the relations among successive sentences. To support to the bi-RATM, we propose a bi-Directional Recurrent Attentional Bayesian Process (bi-RABP) to handle the sequences. Based on the bi-RABP, bi-RATM fully utilizes the bi-directional sequential information of the sentences in a document. Online bi-RATM is proposed to handle large-scale corpus. Experiments on two corpora show that the proposed model outperforms state-of-the-art methods on document modeling and classification.

中文翻译:

双向循环注意主题模型

在一个文档中,一个句子的主题分布既取决于它的相邻句子的主题,也取决于它自己的内容,而且它通常受到不同权重的相邻句子的主题的影响。一个句子的相邻句子包括前面的句子和后面的句子。同时,文档可以被视为句子序列是很自然的。大多数现有的贝叶斯文档建模工作都没有考虑到这些点。为了填补这一空白,我们提出了一种用于文档嵌入的双向循环注意主题模型 (bi-RAATM)。bi-RATM 不仅利用句子之间的顺序,还利用注意力机制对连续句子之间的关系进行建模。为了支持双 RATM,我们提出了一个双向循环注意贝叶斯过程(bi-RABP)来处理序列。基于bi-RABP,bi-RAATM充分利用了文档中句子的双向顺序信息。提出了在线双 RATM 来处理大规模语料库。对两个语料库的实验表明,所提出的模型在文档建模和分类方面优于最先进的方法。
更新日期:2020-09-29
down
wechat
bug