当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Online Semantic-Enhanced Graphical Model for Evolving Short Text Stream Clustering
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2021-09-30 , DOI: 10.1109/tcyb.2021.3108897
Jay Kumar 1 , Salah Ud Din 1 , Qinli Yang 1 , Rajesh Kumar 2 , Junming Shao 1
Affiliation  

Due to the popularity of social media and online fora, such as Twitter, Reddit, Facebook, and Wechat, short text stream clustering has gained significant attention in recent years. However, most existing short text stream clustering approaches usually work on static data and tend to cause a “term ambiguity” problem due to the sparse word representation. Beyond, they often exploit short text streams in a batch way and are difficult to find evolving topics in term-changing subspaces. In this article, we propose an online semantic-enhanced graphical model for evolving short text stream clustering (OSGM), by exploiting the word-occurrence semantic information and dynamically maintaining evolving active topics in term-changing subspaces in an online way. Compared to the existing approaches, our online model is not only free of determining the optimal batch size but also lends itself to handling large-scale data streams efficiently. It is also able to handle the “term ambiguity” problem without incorporating features from external resources. More importantly, to the best of our knowledge, it is the first work to extract evolving topics in term-changing subspaces automatically in an online way. Extensive experiments demonstrate that the proposed model yields better performance compared to many state-of-the-art algorithms on both synthetic and real-world datasets.

中文翻译:


用于演化短文本流聚类的在线语义增强图形模型



由于 Twitter、Reddit、Facebook 和微信等社交媒体和在线论坛的流行,短文本流聚类近年来受到了极大的关注。然而,大多数现有的短文本流聚类方法通常适用于静态数据,并且由于稀疏的单词表示而往往会导致“术语歧义”问题。此外,他们经常以批量方式利用短文本流,并且很难在术语变化的子空间中找到不断发展的主题。在本文中,我们提出了一种用于演化短文本流聚类(OSGM)的在线语义增强图形模型,通过利用单词出现语义信息并以在线方式动态维护术语变化子空间中演化的活动主题。与现有方法相比,我们的在线模型不仅无需确定最佳批量大小,而且还有助于有效处理大规模数据流。它还能够在不合并外部资源特征的情况下处理“术语歧义”问题。更重要的是,据我们所知,这是第一个以在线方式自动提取术语变化子空间中不断发展的主题的工作。大量的实验表明,与许多最先进的算法相比,所提出的模型在合成数据集和真实数据集上都能产生更好的性能。
更新日期:2021-09-30
down
wechat
bug