当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning From Short Text Streams With Topic Drifts
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2017-09-18 , DOI: 10.1109/tcyb.2017.2748598
Peipei Li , Lu He , Haiyan Wang , Xuegang Hu , Yuhong Zhang , Lei Li , Xindong Wu

Short text streams such as search snippets and micro blogs have been popular on the Web with the emergence of social media. Unlike traditional normal text streams, these data present the characteristics of short length, weak signal, high volume, high velocity, topic drift, etc. Short text stream classification is hence a very challenging and significant task. However, this challenge has received little attention from the research community. Therefore, a new feature extension approach is proposed for short text stream classification with the help of a large-scale semantic network obtained from a Web corpus. It is built on an incremental ensemble classification model for efficiency. First, more semantic contexts based on the senses of terms in short texts are introduced to make up of the data sparsity using the open semantic network, in which all terms are disambiguated by their semantics to reduce the noise impact. Second, a concept cluster-based topic drifting detection method is proposed to effectively track hidden topic drifts. Finally, extensive studies demonstrate that as compared to several well-known concept drifting detection methods in data stream, our approach can detect topic drifts effectively, and it enables handling short text streams effectively while maintaining the efficiency as compared to several state-of-the-art short text classification approaches.

中文翻译:


从带有主题漂移的短文本流中学习



随着社交媒体的出现,搜索片段和微博客等短文本流在网络上流行起来。与传统的普通文本流不同,这些数据呈现出长度短、信号弱、海量、高速度、主题漂移等特点。因此,短文本流分类是一项非常具有挑战性和重要的任务。然而,这一挑战很少受到研究界的关注。因此,借助从Web语料库获得的大规模语义网络,提出了一种新的特征扩展方法,用于短文本流分类。它建立在增量集成分类模型的基础上,以提高效率。首先,引入更多基于短文本中术语含义的语义上下文,以利用开放语义网络弥补数据稀疏性,其中所有术语都通过语义消除歧义,以减少噪声影响。其次,提出了一种基于概念簇的主题漂移检测方法,以有效跟踪隐藏的主题漂移。最后,广泛的研究表明,与数据流中几种众所周知的概念漂移检测方法相比,我们的方法可以有效地检测主题漂移,并且与几种现有技术相比,它能够有效地处理短文本流,同时保持效率。 -艺术短文本分类方法。
更新日期:2017-09-18
down
wechat
bug