当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
τ-SS3: A text classifier with dynamic n-grams for early risk detection over text streams
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-07-10 , DOI: 10.1016/j.patrec.2020.07.001
Sergio G. Burdisso , Marcelo Errecalde , Manuel Montes-y-Gómez

A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF’s eRisk open tasks. SS3 was designed to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This aspect could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important “on the fly”. This paper introduces τ-SS3, an extension of SS3 that allows it to recognize useful patterns over text streams dynamically. We evaluated our model in the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results suggest that τ-SS3 is able to improve both current results and the richness of visual explanations.



中文翻译:

τ- SS3:具有动态n-gram的文本分类器,可对文本流进行早期风险检测

最近推出的分类器SS3已显示非常适合处理文本流上的早期风险检测(ERD)问题。它在CLEF的eRisk开放任务中获得了Reddit早期抑郁症和厌食症检测的最新技术。SS3旨在自然地解决ERD问题,因为:它支持对文本流进行增量训练和分类,并且可以直观地说明其原理。但是,SS3使用缺乏识别重要单词序列能力的词袋模型来处理输入。这方面可能会对分类性能产生负面影响,并且还会降低视觉解释的描述性。在标准文档分类字段中,使用单词n-gram来克服其中的某些限制是很常见的。不幸,在处理文本流时,使用n-gram并非易事,因为系统必须“动态”学习并识别哪些n-gram是重要的。本文介绍τ -SS3,SS3的扩展,允许它动态识别文本流上的有用模式。我们在eRisk 2017和2018早期抑郁和厌食症检测任务中评估了我们的模型。实验结果表明,τ- SS3既可以改善当前结果,又可以改善视觉解释的丰富性。

更新日期:2020-07-18
down
wechat
bug