当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Topic modeling for sequential documents based on hybrid inter-document topic dependency
Journal of Intelligent Information Systems ( IF 3.4 ) Pub Date : 2021-01-25 , DOI: 10.1007/s10844-020-00635-4
Wenbo Li , Hiroto Saigo , Bin Tong , Einoshin Suzuki

We propose two new topic modeling methods for sequential documents based on hybrid inter-document topic dependency. Topic modeling for sequential documents is the basis of many attractive applications such as emerging topic clustering and novel topic detection. For these tasks, most of the existing models introduce inter-document dependencies between topic distributions. However, in a real situation, adjacent emerging topics are often intertwined and mixed with outliers. These single-dependency based models have difficulties in handling the topic evolution in such multi-topic and outlier mixed sequential documents. To solve this problem, our first method considers three kinds of topic dependencies for each document to handle its probabilities of belonging to a fading topic, an emerging topic, or an independent topic. Secondly, we extend our first method by considering fine-grained dependencies in a given context for more complex topic evolution sequences. Our experiments conducted on six standard datasets on topic modeling show that our proposals outperform state-of-the-art models in terms of the accuracy of topic modeling, the quality of topic clustering, and the effectiveness of outlier detection.



中文翻译:

基于混合文档间主题相关性的顺序文档主题建模

我们提出了两种基于混合文档间主题依赖的顺序文档的新主题建模方法。顺序文档的主题建模是许多有吸引力的应用程序的基础,例如新兴的主题聚类和新颖的主题检测。对于这些任务,大多数现有模型都引入了主题分布之间的文档间依赖关系。但是,在实际情况下,相邻的新兴主题通常会交织在一起,并与离群值混杂在一起。这些基于单依赖项的模型在处理此类多主题和异常混合顺序文档中的主题演变时遇到困难。为了解决这个问题,我们的第一种方法考虑了每种文档的三种主题依赖关系,以处理其属于褪色主题,新兴主题或独立主题的可能性。其次,我们通过考虑给定上下文中更细粒度的依赖关系来扩展我们的第一种方法,以实现更复杂的主题演化序列。我们在六个有关主题建模的标准数据集上进行的实验表明,我们的提案在主题建模的准确性,主题聚类的质量以及离群值检测的有效性方面都优于最新模型。

更新日期:2021-01-28
down
wechat
bug